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EDITOR'S  NOTE 


The  reader  may  notice  some  inconsistency  with  respect  to  the  rules  of 
punctuation  used  in  this  volumn,   This  situation  was  brought  about  by  the  fact 
that  the  copies  of  papers  read  during  the  conference  that  were  submitted  to  the 
Editor  of  this  volumn  have  been  reproduced  in  the  form  in  which  they  were 
submi  tted . 

In  most  cases,  the  reports  reproduced  in  this  volumn  were  presented  to 
the  conference  by  their  authors.   Three  of  the  reports  were  co-authored.   Dr. 
Sticht  presented  the  report  in  Chapter  II,  Dr.  Fou 1 ke  presented  the  report  in 
Chapter  III,  and  Dr.  Friedman  presented  the  report  in  Chapter  X. 

Dr.  S.  J.  Campanella  and  Dr.  de  Hoop  accepted  the  invitation  to  present 
reports  to  the  conference.   However,  because  of  unforeseen  conflicts,  they 
were  unable  to  attend,   The  reports  they  were  to  have  presented  are    reproduced 
in  Chapters  XVI  and  XVI  I . 

The  report  in  Chapter  V,  written  by  Donald  Wing  Hathaway,  based  on  data 
supplied  by  Project  Director,  Rose  Diamond,  was  presented  at  the  conference  by 
Dr,  Kinney.   Upon  conclusion  of  the  written  report,  Dr.  Kinney   added  some 
remarks  of  his  own,  and  these  are    also  reproduced  in  Chapter  V, 

Most  of  Mr.  Allen's  report,  reproduced  in  Chapter  XI,  was  recorded  on 
magnetic  tape  and  reproduced  at  the  conference  on  the  Eltro  Information  Rate 
Changer  Mark  II,  the  speech  compressor  marketed  in  this  country  by  the  Gotham 
Audio  Corporation.   This  permitted  him.  by  making  adjustments  in  the  amount 
of  compression  or  expansion,  as  the  tape  was  reproduced,  to  demonstrate  the 
capability  of  the  Eltro  Information  Rate  Changer. 

Since  there  was  considerable  overlap  in  the  references  cited  by  authors, 
it  was  decided  not  to  include  a  list  of  references  at  the  end  of  each  report. 
Rather,  a  single  list  of  references  was  compiled  from  the  references  cited  by 
each  author  and  is  included  at  the  end  of  the  reports.   This  list  of  references, 
though  possibly  not  bibliographic  in  scope,  is  extensive,  and  it  is  hoped  that 
it  will  serve  as  a  valuable  resource  to  those  wishing  to  read  in  the  area. 
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EDITOR'S  NOTE 


The  reader  may  notice  some  inconsistency  with  respect  to  the  rules  of 
punctuation  used  in  this  volumn.   This  situation  was  brought  about  by  the  fact 
that  the  copies  of  papers  read  during  the  conference  that  were  submitted  to  the 
Editor  of  this  volumn  have  been  reproduced  in  the  form  in  which  they  were 
submi  tted . 

In  most  cases,  the  reports  reproduced  in  this  volumn  were  presented  to 
the  conference  by  their  authors.   Three  of  the  reports  were  co-authored.   Dr. 
Sticht  presented  the  report  in  Chapter  II,  Dr.  Foulke  presented  the  report  in 
Chapter  III,  and  Dr.  Friedman  presented  the  report  in  Chapter  X. 

Dr.  S.  J.  Campanella  and  Dr,  de  Hoop  accepted  the  invitation  to  present 
reports  to  the  conference.   However,  because  of  unforeseen  conflicts,  they 
were  unable  to  attend.   The  reports  they  were  to  have  presented  are  reproduced 
in  Chapters  XVI  and  XVI  I . 

The  report  in  Chapter  V,  written  by  Donald  Wing  Hathaway,  based  on  data 
supplied  by  Project  Director,  Rose  Diamond,  was  presented  at  the  conference  by 
Dr.  Kinney.   Upon  conclusion  of  the  written  report,  Dr.  Kinney  added  some 
remarks  of  his  own,  and  these  are    also  reproduced  in  Chapter  V. 

Most  of  Mr.  Allen's  report,  reproduced  in  Chapter  XI,  was  recorded  on 
magnetic  tape  and  reproduced  at  the  conference  on  the  Eltro  Information  Rate 
Changer  Mark  II,  the  speech  compressor  marketed  in  this  country  by  the  Gotham 
Audio  Corporation.   This  permitted  him.  by  making  adjustments  in  the  amount 
of  compression  or  expansion,  as  the  tape  was  reproduced,  to  demonstrate  the 
capability  of  the  Eltro  Information  Rate  Changer. 
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it  was  decided  not  to  include  a  list  of  references  at  the  end  of  each  report. 
Rather,  a  single  list  of  references  was  compiled  from  the  references  cited  by 
each  author  and  is  included  at  the  end  of  the  reports.   This  list  of  references, 
though  possibly  not  bibliographic  in  scope,  is  extensive,  and  it  is  hoped  that 
it  will  serve  as  a  valuable  resource  to  those  wishing  to  read  in  the  area. 
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INTRODUCTION 

As  human  beings,  we  depend  heavily,  and  perhaps  more  than  we  realize,  on 
spoken  language  for  the  communication  that  supports  daily  living.   Because  aural 
communication  is  such  an  integral  part  of  daily  living,  we  have  tended  to  take 
it  for  granted,  and  have  not  subjected  it  to  the  same  kind  of  scrutiny  that  has 
been  applied  to  recognized  communication  systems  such  as  the  writing  and  reading 
of  language   However,  in  recent  years,  aural  communication  has  begun  to  receive 
more  specific  attention  by  educators  and  researchers-   Examples  of  this  increased 
interest  are    the  courses  offered  in  many  schools  for  the  purpose  of  improving 
1 i  s  ten  i  ng  ski  1  Is. 

A  special  interest  in  aural  communication  has  been  expressed  by  those  who, 
for  whatever  reason,  must  place  extraordinary  reliance  upon  what  they  hear  in 
order  to  communicate.   Blind  school  children,  for  instance,  depend  heavily  upon 
recorded  language  because  they  do  not  have  access  to  the  system  of  communication 
built  around  the  print  letter  code,  and  because  the  rate  at  which  braille  can 
be  read  is  very  slow  when  compared  with  the  visual  reading  rate   Also   because 
auditory  information  is  more  easily  and  economically  transmitted  than  visual 
information  by  wire  or  radio,  there  are   many  situations,  such  as  military 
operations,  in  which  exclusive  reliance  is  placed  on  the  aural  communication 
of  crucial  messages 

One  consequence  of  the  increased  interest  in  the  process  of  aural 
communication  has  been  a  significant  advance  in  the  technology  associated  with 
the  recording  and  reproduction  of  speech   As  is  true  in  the  case  of  communication 
by  reading,  a  variable  of  obvious  interest  to  those  concerned  with  the  process 
of  aural  communication  has  been  the  rate  at  which  it  occurs    Without  special 
intervention,  aural  communication  is  governed  by  the  rate  at  which  speakers  produce 
words-.   However,  certain  advantages  might  be  gained  if  this  rate  could  be  altered 
If  it  could  be  increased  without  a  sacrifice  in  comprehension,  the  savings  in  time 
might  be  quite  valuable  to  those  who  must  rely  heavily  upon  aural  communication. 
If  it  could  be  reduced  without  a  loss  in  the  intelligibility  of  words,  mentally 
retarded  children  or  students  of  a  foreign  language  might  find  the  slower  speech 
he  1 pf u 1  . 

The  first  method  for  altering  the  word  rate  of  recorded  speech  to  receive 
the  attention  of  investigators  was  that  of  reproducing  a  tape  or  record  at  a 
different  speed  than  the  speed  used  during  recording   This  method  achieves  the 
desired  effect  as  far  as  word  rate  is  concerned    Reproduction  at  a  faster  speed 
increases  word  rate,  while  reproduction  at  a  slower  speed  reduces  word  rate 
However,  in  either  case,  serious  distortion  is  introduced  that  soon  renders 
speech  unintelligible    Fortunately,  another  method,  pioneered  by  Dr„  Grant  Fair- 
banks (195*0  at  the  University  of  Illinois,  was  introduced.   This  is  a  sampling 
method  in  which  brief  segments  of  a  recorded  message  are   discarded,  and  the 
resulting  gaps  in  the  message  are  eliminated.   It  has  been  found  that  if  the 
discarded  segments  are  brief  enough,  the  listener  is  not  aware  of  their  absence, 
and  what  he  hears  is  speech  that  sounds  normal  in  all  respects  except  that  the 
word  rate  has  been  increased,   The  word  rate  may  also  be  reduced  by  repeating, 
rather  than  discarding  samples  of  a  recorded  message  as  it  is  reproduced    The 
control  of  speech  rate,  that  was  made  possible  by  the  introduction  of  equipment 
for  sampling  speech  in  the  manner  just  described,  has  encouraged  a  great  deal  of 
research  regarding  the  effect  of  speech  rate  on  the  intelligibility  of  single 
words,  and  the  comprehension  of  connected  discourse  (Bixler,  Fou 1 ke ,  Amster, 
£•  Nolan,  1961;  Fairbanks,  Guttman  &  Miron,  1957;  Fairbanks  &  Kodman,  1957; 
Foulke,  1964;  Fou 1 ke  &  Bixler,  I966;  Friedman,  Orr,  Freedle  &  Norris,  1 966 ; 
Garvey,  1953;  Orr  &  Friedman,  1964).   Experimental  attention  has  been  directed 
to  a  variety  of  questions  related  to  the  speech  rate  variable  and  there  has  been  an 
accumulation  of  experimental  results  in  support  of  the  general  conclusion  that 
speech  may  be  presented  at  a  rate  of  approximately  275  or  280  words  per  minute  with 
the  expectation  of  satisfactory  comprehension 


Because  of  these  findings,  many  people  have  begun  to  give  serious  consideration 
to  the  possibility  that  accelerated  speech  may  have  a  useful  role  to  play  in  many 
educational  programs,   Programs  organized  around  the  needs  of  blind  children  constitute 
an  obvious  example,  but  there  has-also  been  an  expression  of  interest  from  other 
quarters,  for  instance,  those  concerned  with  college  and  public  school  education. 
The  increased  interest  of  those  who  wish  to  make  practical  use  of  the  ability  to 
control  and  vary  speech  rate  has  served  as  a  kind  of  feedback  to  researchers,  with 
the  result  that  there  has  been  a  further  increase  in  research  activity. 

On  December  10  &  11,  1 965 ,  Mr,  Robert  Bray,  Chief,  Division  for  the  Blind 
and  Physically  Handicapped,  Library  of  Congress,  called  together  a  group  of  people 
interested  in  exploring  applications  of  compressed  or  accelerated  speech.   In 
recognition  of  the  situation  just  described,  the  group  recommended  that  a  conference 
of  national  scope  be  held  for  the  purpose  of  determining  the  present  status  of 
research  and  development  with  respect  to  the  production  and  use  of  rate  controlled 
speech,  informing  interested  people  of  its  current  status,  and  for  formulating  plans 
relating  to  the  future  development  of  the  area.   This  conference  was  organized  by 
Mr,  Bray  and  his  staff  at  the  Library  of  Congress,  by  Dr.  Emerson  Fou 1 ke  at  the 
University  of  Louisville,  and  with  financial  assistance  by  funds  provided  by  the 
Office  of  Education  and  administered  by  the  American  Printing  House  for  the  Blind. 
The  conference  was  convened  at  the  University  of  Louisville  on  October  19,  20,  and 
21,  1966,   It  was  attended  by  approximately  100  people  from  all  parts  of  the  nation 
and  from  Canada,  (see  Appendix  A),  with  interests  ranging  from  the  use  of  accelerated 
speech  as  a  means  of  testing  theories  of  cognitive  processing,  to  the  insertion  of 
rate  controlled  speech  into  ongoing  educational  programs. 

On  the  first  day  of  the  conference,  reports  of  experiments  and  demonstration 
involving  time  compressed  speech,  and  of  equipment  for  the  production  of  such 
speech  were  given.   These  reports  constitute  the  principle  contents  of  this  volumn. 
On  the  second  day,  conference  participants  were  divided  into  seven  discussion  groups, 
and  a  chairman  was  appointed  for  each  group.   Group  assignemtns  were  made  in  such 
a  way  that  the  professions  and  interests  represented  in  the  conference  at  large  were 
proportionally  represented  in  each  group,  as  well,   Professions  represented  at  the 
Conference  included  psychology,  education,  speech  science,  linguistics,  computer 
science,  library  science,  electrical  engineering,  school  administration,  and 
manufacturing  and  sales.   Groups  were  instructed  to  range  freely  over  the  area    in 
discussing  the  problems  related  to  the  present  status  of  time  compressed  or  expanded 
speech  as  a  useful  communication  tool  and  its  prospects  for  future  development,   They 
were  told  to  have  no  concern  for  duplication  of  effort  in  the  belief  that  the  extent 
of  duplicaton  would  indicate  the  relative  importance  of  the  points  discussed,  and  that 
unrestricted  discussion  might  prove  more  creative.   On  the  third  conference  day,  the 
chairmen  of  discussion  groups  summarized  the  deliberations  of  their  groups  for  the 
conference  at  large.   The  final  chapter  in  this  volumn  has  been  prepared  from  these 
summa  r  i  es , 

Before  adjouring  the  conference,  Mr.  Bray,  Chairman  of  the  Conference, 
appointed  an  Implementation  Committee  and  charged  it  with  the  responsibility  of  promoting 
the  recommendations  developed  by  the  discussion  groups  and  reported  by  their  chairmen. 
This  committee  has  met  several  times  since  the  conference,  and  its  activity  is  discussed 
in  the  final  chapter  of  this  volumn  (page  \kS) . 


CHAPTER  I  I 
A  Review  of  Research  on  Time  Compressed  Speech 


Emerson  Fou 1 ke 
Thomas  G.  Sticht* 


I ntroduct ion 

Accelerated  speech  is  speech  in  which  the  word  rate  has  been  increased. 
Increasing  the  word  rate  reduces  the  time  required  for  a  given  message.   Therefore, 
accelerated  speech  is  often  referred  to  as  time  compressed,  or  simply,  compressed 
speech.   If  speech,  when  accelerated,  remains  comprehensible,  the  savings  in 
time  should  be  an  important  consideration  in  those  situations  in  which  extensive 
reliance  is  placed  upon  aural  communication.   The  expression  of  interest  in 
time  compressed  speech  came  originally  from  communication  services  where  limited 
channels  for  communication  are  available,  and  a  great  deal  of  information  must 
be  transmitted.   Obviously,  more  messages  can  be  sent  through  a  given  channel 
if  the  time  per  message  is  reduced.   Recently,  there  has  also  been  an  interest 
in  the  use  of  time  compressed  speech  in  order  to  make  available  to  blind  people 
a  reading  rate  that  compares  more  favorably  with  the  visual  reading  rate.  This 
paper  is  concerned  with  the  communication  problems  produced  by  those  operations 
that  must  be  performed  upon  speech  in  order  to  compress  it  in  time.   First,  the 
various  techniques  for  accelerating  speech  are   described,  and  then  the  methods 
which  have  been  used  for  the  evaluation  of  the  intelligibility  and  the  comprehens i b i 1 i ty 
of  time  compressed  speech  are   discussed.   There  follows  a  comparison  of  the  different 
methods  of  accelerating  speech  with  respect  to  intelligibility  and  comprehension. 
Attention  is  next  focused  on  those  characteristics  of  the  listener,  such  as  age 
and  intelligence,  that  may  have  a  bearing  upon  the  comprehension  of  accelerated 
speech.   Finally,  an  hypothesis  is  formulated  to  account  for  the  effects  of 
acceleration  on  the  comprehension  of  speech,  and  directions  for  future  research 
a  re  suggested  . 

Methods  for  the  Acceleration  of  Speech 

Speaki  ng  Rap  idly 

Within  limits,  word  rate  is  under  the  control  of  the  speaker  (Calearo 
&  Lazzaroni,  1957;  deQu i ros ,  1964;  Enc  &  Stolurow,  19&0;  Fergen,  1955;  Goldstein, 
1 9^+0 ;  Harwood,  1955;  Nelson,  19^8).   This  method  has  the  virtue  of  simplicity 
and  requires  no  special  apparatus.   However,  speaking  at  a  rate  that  is  faster 
than  normal  introduces  undes i red  changes  in  vocal  inflection  and  fluctuations 
in  rate,  and  a  relatively  low  upper  limit  makes  the  method  generally  unsuitable. 

The  Speed  Changing  Method 

The  word  rate  of  recorded  speech  may  be  changed  simply  by  reproducing 
it  at  a  different  tape  or  record  speed  than  the  one  used  during  recording.   If 
the  playback  speed  is  slower  than  the  recording  speed,  the  word  rate  is 
decreased  and  the  speech  is  expanded  in  time.   If  the  playback  speed  is  in- 
creased, the  word  rate  is  increased  and  the  speech  is  compressed  in  time. 


"Dr.  Emerson  Fou 1 ke  is  Director  of  the  Center  for  Rate  Controlled 
Recordings  and  is  associate  professor  of  Psychology  in  the  Department  of 
Psychology  at  the  University  of  Louisville,  Louisville,  Kentucky,  ^0208. 

Dr.  Thomas  G«  Sticht  is  a  post  doctoral  Research  Fellow  at  the 
Department  of  Psychology,  University  of  Pittsburgh,  Pittsburgh,  Pennsylvania 
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been  examined  in  sever 
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peed.   For  instance,  if  the  speed  is  doubled,  the  component 

will  be  doubled,  and  overall  vocal  pitch  will  be  raised  one 
eech,  the  rate  of  which  has  been  altered  by  this  method,  has 
ed  in  several  experiments  (Fletcher,  1929,  pg.  293-29^;  Foulke, 
ey,  1953a;  Klumfp  &  Webs  ter ,  1961;  McLain,  1962). 


The  Sampling  Method 

In  1950,  Miller  and  Licklider  demonstrated  the  redundancy  in  speech 
by  deleting  segments  of  the  speech  signal.   This  was  accomplished  by  a 
switching  arrangement  that  permitted  the  speech  signal  to  be  turned  off 
periodically.   They  found  that  so  long  as  these  interruptions  occurred  more 
than  ten  times  per  second,  the  interrupted  speech  was  easily  understood. 
Intelligibility  of  monosyllabic  words  did  not  drop  below  90%  until  50%  of 
the  speech  signal  had  been  discarded.   Thus,  it  appeared  that  a  large  portion 
of  the  speech  signal  could  be  discarded  without  a  serious  disruption  of 
communication.   Garvey  (1953a),  taking  cognizance  of  these  results,  reasoned 
that  if  the  gaps  in  a  message  that  had  been  periodically  interrupted  were 
closed,  the  result  should  be  time  compressed  intelligible  speech  without 
distortion  in  vocal  pitch.   To  test  this  notion,  he  prepared  a  tape  on  which 
speech  had  been  recorded  by  periodically  cutting  out  short  segments  of 
tape,  and  by  splicing  the  free  ends  of  the  retained  tape  together  again. 
Reproduction  of  this  tape  achieved  the  desired  effect.   Garvey's  method 
was,  of  course,  too  cumbersome  for  any  but  research  purposes.   However, 
the  success  of  the  general  approach  having  been  shown,  an  efficient  technique 
for  accomplishing  it  was  not  long  to  follow. 

In  1954,  Fairbanks,  Everitt  and  Jaeger  published  a  description  of 
an  electromechanical  apparatus  which  makes  possible  the  time  compressed  or 
expanded  reproduction  of  recorded  tape.   Since  that  time,  other  similar 
equipment  has  been  made  commercially  available.   A  description  by  Foulke 
(1964)  of  the  device  used  in  his  research  characterizes  the  electromechanical 
approach  in  sampling  speech.   "The  device  we  use  to  compress  speech  is  the 
Tempo-Regulator,  manufactured  by  Telefonbau  und  Normalzeit,  Frankf urt-am-Ma i n , 
Germany.^   The  Tempo-Regulator  samples  recorded  tape  in  the  following  manner. 
The  tape  passes  over  the  curved  surface  of  a  cylinder  and  wraps  around  the 
cylinder  enough  to  make  contact  with  one-quarter  of  its  circumference.   Four 
tape  reproducing  heads  are    spaced  equally  around  the  circumference  of  the 
cylinder.   When  this  cylinder  is  stationary,  and  the  tape  is  moving  at  the 
same  speed  at  which  it  moved  during  recording,  (15  inches  per  second),  it 
makes  contact  with  one  of  the  reproducing  heads  and  the  signal  is  reproduced 
as  recorded.   When  the  Tempo-Regulator  is  adjusted  for  some  amount  of  compression, 
the  speed  of  the  tape  increases  and  the  cylinder  begins  to  rotate  in  the  direction 
of  tape  motion.   As  the  speed  of  the  tape  is  increased,  the  rotational  speed 
of  the  cylinder  is  increased  so  that  the  speed  of  the  tape  relative  to  the  surface 
of  the  cylinder  is  held  constant  at  15  ips.   Under  these  conditions,  each  of 
the  tour  heads,  in  turn,  makes  and  then  loses  contact  with  the  tape.   Each  head 
reproduces,  as  recorded,  the  material  on  the  portion  of  the  tape  with  which 


-The  successor  to  the  Tempo  Regulator  is  the  Eltro  Information  Rate 
.Changer,  sold  in  this  country  by  Gotham  Audio  Corporation,  2  West  46th  Street, 
New  York,  New  York,  1 0036. 


it  makes  contact*   When  the  cylinder  is  so  positioned  that  one  head  is  just 
losing  contact  with  the  tape  while  the  preceeding  head  is  just  making  contact 
with  the  tape,  the  segment  of  tape  that  is  wrapped  around  the  cylinder  between 
these  two  heads  never  makes  contact  with  a  reproducing  head  and  is   therefore 
not  reproduced.   The  segment  of  the  tape  that  is  eliminated  from  the  reproduction 
in  this  manner  is  always  the  same  length,  one-quarter  of  the  circumference 
of  the  cylinder.   The  amount  of  speech  compression  depends  upon  the  number  of 
such  eliminations  per  unit  time,  and  this,  in. turn,  depends  upon  the  tape 
and  cylinder  speed.   Speech  may  be  expanded  by  reversing  the  direction  of 
rotation  of  the  cylinder  and  moving  the  tape  across  the  cylinder  at  a  slower 
speed  than  was  used  during  recording-"  The  apparatus  described  by  Fairbanks, 
et  a  1 . ,  makes  use  of  the  principle  just  described.   However,  it  differs  in 
that  tape  speed  and  cylinder  speed  are    independently  variable,  making  it 
possible  to  vary  the  temporal  value  of  the  discarded  portions  of  the  message.- 

A  computer  may  also  be  used  to  compress  speech  by  the  sampling  method 
(Scott,  1965),   In  this  approach,  speech  that  has  been  transduced  to  electrical 
form,  e.g.,  the  output  of  a  microphone  or  tape  reproducing  head,  is  temporally 
segmented  by  an  ana  log- to-d i g i ta I  converter  and  these  segments  are    stored 
in  the  computer.   Then,  the  computer  samples  these  segments  according  to  a 
sampling  rule  for  which  it  has  been  programmed,  e.g.,  discard  every  third 
segment.   The  sample  thus  constructed  is  supplied  to  the  input  of  a  digital- 
to-analog  converter  and  the  signal  at  the  output  of  this  converter,  compressed 
in  time,  is  appropriate  for  transduction  to  acoustical  form  again. 

Two  variables  that  have  an  important  bearing  on  the  character  of  the 
time  compressed  speech  signal  are  the  temporal  value  of  the  discarded  and 
retained  portions  of  speech,  and  the  rule  according  to  which  speech  is  sampled. 
In  devices  like  the  Tempo  Regulator  and  its  successor,  the  Eltro  Information 
Rate  Changer,  the  temporal  value  of  retained  portions  of  speech  is  variable, 
but  the  temporal  value  of  discarded  portions  is  fixed  by  the  distance  between 
playback  heads  along  the  surface  of  the  rotating  cylinder  that  samples  the 
tape  to  be  compressed,  and  is  not  variable.   Using  the  Fairbanks  scheme,  the 
temporal  value  of  both  the  retained  and  the  discarded  portions  is  variable. 
Using  the  computer,  the  temporal  value  of  both  portions  is  variable  over  a 
wide  range  of  values,. 

Electromechanical  compressors,  like  the  Tempo  Regulator  or  the 
Fairbanks  apparatus,  are  unselective  with  respect  to  the  parts  of  a  message 
that  are   discarded.   Portions  are   discarded  on  a  periodic  basis  and  may  occur 
anywhere  within  or  between  words,  and  it  is  quite  unlikely  that  a  given  message, 
subjected  to  consecutive  compressions,  would  be  sampled  in  the  same  way.   If 
sampling  is  accomplished  manually,  as  Garvey  did,  some  selectivity  is  possible. 
Though  Garvey  sampled  on  a  periodic  basis,  he  was  careful  to  insure  that  the 
onset  of  each  word  coincided  with  the  first  segment  of  tape  to  be  discarded  in 
sampling  that  word.   Diehl,  White  &  Burk  (1959),  using  Garvey's  manual  method, 
compressed  speech  to  some  extent  by  removing  the  time  between  words.   With 
use  of  the  computer,  it  is  feasible  to  use  a  great  variety  of  sampling  rules. 
For  instance,  a  program  might  be  written  according  to  which  the  temporal 
intervals  between  words  were  discarded  and  the  time  filled  by  a  given  word 
was  periodically  sampled  with  the  restriction  that  no  consonantal  sounds 
could  be  discarded. 

From  what  has  just  been  said,  it  would  appear  that  the  computer, 
because  of  its  flexibility,  offers  the  most  satisfactory  method  for  the  time 
compression  or  expansion  of  speech.   This  may  ultimately  prove  to  be  the  case. 
However,  at  present,  computer  time  is  too  expensive  to  justify  the  employment 
of  a  computer  in  this  capacity  for  any  but  research  purposes. 


"A  detailed  description  of  the  means  by  which  such  variation  is 
accomplished,  is  beyond  the  scope  of  the  present  paper.   Those  interested 
in  such  a  description  are    referred  to  the  article  by  Fairbanks,  Everitt, 
and  Jaeger, 


Methods  Used  in  Evaluating  Accelerated  Speech 
Some  Procedural  Problems 

There  is  no  common  practice  in  specifying  amount  of  compression 
to  which  a  listening  selection  has  been  subjected.   This  lack  of  uniformity 
can  result  in  confusion,  especially  when  the  results  of  different  studies 
are    compared.   The  problem  has  been  discussed  by  Bellamy  (1966). 

The  amount  of  compression  may  be  specified  by  the  fraction,  expressed 
as  a  percent,  of  the  time  originally  required  for  the  production  of  a  message, 
that  is  saved  by  reproducing  that  message  at  a  faster  word  rate  (30%  compression 
means  that  30%  of  the  original  time  has  been  saved),  or  the  compliment  of 
that  fraction,  expressed  as  a  percent,  may  be  used  to  indicate  the  fraction 
of  the  original  time  remaining  after  compression.   Alternatively,  specification 
may  be  made  in  terms  of  the  acceleration  of  the  original  word  rate,  tape 
speed,  or  record  speed,  (an  acceleration  of  1.5  means  that  the  word  rate  after 
compression  is  1.5  times  the  word  rate  before  compression).   In  comparing  these 
indices,  it  must  be  remembered  that  the  relationship  between  them  is  not 
linear.   For  instance,  whereas  an  increase  in  acceleration  from  1.1  to  1.2 
corresponds  to  an  increase  in  the  percent  of  compression  from  9%  to  17%,  an 
increase  in  acceleration  from  1.9  to  2.0  corresponds  to  a  change  in  the 
percent  of  compression  from  h~l%   to  50%.   A  problem  common  to  both  indices 
is  that  they  do  not  indicate  directly  the  word  rate  of  compressed  speech. 
The  final  word  rates  of  two  listening  selections  compressed  or  accelerated 
by  the  same  amount  will  depend  upon  the  original  or  uncompressed  word  rates, 
and  may  differ  considerably.   Although  initial  word  rates  ranging  from  125 
to  175  wpm  have  been  used  in  those  experiments  exploring  the  relationship 
between  word  rate  and  comprehension  (Nelson,  19^+8;  Harwood,  1955;  Diehl, 
et  a  1 . ,  1959;  Fairbanks,  Guttman,  &■  Miron,  1957a;  Foulke,  Amster,  Nolan  & 
Bixler,  1962),  the  general  conclusion  to  be  drawn  from  these  studies  is  that 
comprehension  is  only  moderately  affected  by  increasing  word  rate  until  a 
word  rate  of  approximately  275  or  280  wpm  is  reached,  and  that  comprehension 
begins  to  decline  rapidly  at  about  this  word  rate,  regardless  of  the  initial 
or  uncompressed  word  rate.   Also,  in  an  experiment  by  Foulke  (unpublished 
data),  a  listening  selection  was  read  at  three  different  rates,  (1^9,  164.6 
and  195.7  wpm)  by  a  professional  reader.   These  renditions  were  then  compressed 
to  a  final  word  rate  of  275  wpm,  and  each  rendition  was  presented  to  one  of 
the  three  comparable  groups  of  listeners.   Upon  hearing  the  selections,  listeners 
were  tested  for  comprehension,  and  the  resulting  distributions  of  test  scores 
were  not  significantly  different. 

Thus,  in  describing  compressed  speech,  specification  in  terms  of 
word  rate  appears  to  be  necessary,  and  it  is  probably  sufficient.   Word  rate 
is  probably  the  most  meaningful  dimension  in  terms  of  the  cognitive  and 
perceptual  processes  of  the  listener.   Johnson,  Darley  &  Spr iestersbach  (1963, 
pg.  202-203)  have  summarized  research  supporting  the  conclusion  that  the 
perception  of  rate  of  speaking  corresponds  directly  to  the  oral  reading  rate 
in  words  per  minute. 

For  certain  purposes,  such  as  the  measurement  of  intelligibility, 
single  words  are    compressed,  and  it  is  of  course  meaningless  to  speak  of  the 
word  rate  of  a  single  word.   In  these  cases,  specification  must  be  stated 
in  terms  of  compression  or  acceleration. 

If  compressed  speech  is  to  be  specified  in  terms  of  percent  of 
compression  or  acceleration  ratio,  the  word  rate  of  the  original  production 
must  be  determined  and  reported.   There  is  no  "normal"  word  rate  that  can 
safely  be  assumed  since  there  is  considerable  variability  in  the  published 


estimates  of  normal  word  rate.   Part  of  this  variability  is  undoubtedly  due 
to  the  difference  between  spontaneous  conversational  word  rate  and  oral 
reading  word  rate,   Nichels  and  Stevens  (1957)  found  a  conversational  speaking 
rate  of  125  wpm,  while  Johnson,  et  al, ( 1963  pg.  220),  found  a  median  oral 
reading  rate  of  176-5  wpm  and  Fou 1 ke  (unpublished  research)  found  a  mean 
oral  reading  rate  of  17^+  wpm.   The  oral  reading  rate  is  the  word  rate  that 
is  relevant  to  the  issue  under  discussion,  since,  in  most  cases,  the  speech 
that  is  compressed  is  recorded  oral  reading,,   However,  there  is  considerable 
variability  in  the  speaking  rates  of  professional  oral  readers..   In  the 
unpublished  study  just  mentioned,  Fou 1 ke  found  a  standard  deviation  of  23  o  53 
words.   There  is  confusion  regarding  the  various  terms  used  for  compressed 
speech.   Examples  are    compressed  speech,  accelerated  speech,  speeded  speech, 
and  rapid  speech.   These  terms  are  used  interchangeably  by  some  authors 
while  at  least  some  of  them  are    used  d i scr ;m i nate I y  by  others.   The  Library 
of  Congress  (1966)  has  suggested  that  the  term  "compressed  speech"  be  reserved 
for  speech  that  has  been  accelerated  by  the  sampling  method,  while  the  term 
"rapid  speech"  should  be  reserved  for  speech  that  has  been  accelerated  by 
increasing  the  play  back  speed  of  a  recording*   In  the  opinion  of  the  authors, 
however,  the  term  "compressed  speech"  should  more  appropriately  be  regarded 
as  a  term  referring  to  recorded  speech  which  is  reproduced  in  less  than  the 
original  time  regardless  of  the  method  used.. 

The  Measurement  of  Intelligibility 

Two  general  approaches  have  been  employed  in  the  evaluation  of  time 
compressed  speech:   tests  of  the  ability  to  repeat  brief  messages  accurately, 
and  tests  of  the  comprehension  of  listening  selections*   Brief  message 
reproduction  is  taken  as  an  index  of  the  intelligibility  of  time  compressed 
speech,   A  procedure  typical  of  this  approach  is  one  in  which  single  words 
are  compressed  in  time  by  some  amount  and  presented,  one  at  a  time,  to  a 
listener.   The  listener's  task  is  to  reproduce  these  words,  orally  or  in 
writing,  and  the  intelligibility  score  is  the  percent  of  correctly  identified 
words..   This  procedure  is  sometimes  referred  to  as  an  articulation  test 
(Miller,  195^,  pg.  60).   Disjunctive  reaction  time  may  also  be  taken  as  an 
index  of  intelligibility  (Fou Ike,  1965.K   The  underlying  rationale  in  this 
case  is  that  in  the  disjunctive  R-T  experiment,  reduced  intelligibility  means 
reduced  d i scr im i nab i 1 i ty .   It  has  been  shown  that  as  stimuli  are  made  more 
similar  and  hence  less  d i scr imi nab le ,  choice  reaction  time  is  increased, 
(Woodworth  &  Schlosberg,  195^,  p.  33).   The  procedure,  under  this  approach., 
is  to  acquaint  S_  with  a  list  of  words  (e.g.,  three)  and  then  to  present  them 
to  him  one  at  a  time  in  random  order  for  identification.   S  indicates  his 
choice  with  a  discriminative  response  (for  instance,  pressing  the  appropriate 
one  of  several  response  keys).   S_  can  then  be  scored  for  reaction  time  and 
accuracy.   The  experiment  is  performed  with  words  that  have  been  compressed 
in  time  by  several  amounts  and  changes  in  reaction  time  and/or  accuracy  are 
regarded  as  indicative  of  changes  in  intelligibility.   The  reaction  time  method 
may  be  more  sensitive  than  other  methods  since  a  change  in  the  amount  of  compression 
may  produce  a  change  in  the  reaction  time  to  words  that  are   discriminated  without 
error. 

Calearo  and  Lazzaroni  (1957)  report  the  use  of  a  method  familiar  to 
those  in  clinical  audiology  in  order  to  detect  the  effects  of  compression..   The 
minimum  intensity  required  for  words  to  be  intelligible  is  determined  for 
words  at  several  levels  of  compression.   Threshold  intensity  is  defined  as 
that  intensity  at  which  some  percent  of  a  list  of  words  (e.g.  50%)  are    correctly 
identified.   It  a  change  in  the  compression  of  a  list  of  words  is  accompanied 
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by  a  change  in  threshold  intensity  for  that  list,  it  is  concluded  that  time 
compression  has  altered  intelligibility. 

Tests  of  Comprehension 

The  other  common  approach  in  evaluating  the  effects  of  the  acceleration 
of  speech  is  one  in  which  the  listener  first  hears  a  listening  selection  at 
some  accelerated  word  rate  and  then  is  tested  for  knowledge  of  the  facts  and 
implications  of  that  selection.   Any  kind  of  test  may  be  used,  but  researchers 
have,  in  general,  preferred  objective  tests  of  specifiable  reliability.   The 
multiple  choice  test  has  been  a  frequent  choice.   Many  researchers  (Enc  & 
Stolurow,  1 960 ;  Fou  1  ke ,  1964;  Voor  £■  Miller,  1965)  have  used  published  tests 
such  as  the  listening  sub-test  of  the  Sequential  Tests  of  Educational  Progress, 
which  consists  of  listening  selections  covering  a  broad  content  area    and  multiple 
choice  questions  concerning  these  selections.   Such  tests  have  the  advantage  of 
good  technical  specifications,  including  normative  data.   However,  in  some 
cases,  these  tests  have  not  been  well  adapted  to  the  researcher's  needs,  and 
he  has  undertaken  the  construction  of  his  own  tests  (Foulke,  et  aJL  ,  1962; 
Foulke,  1964,  McLain,  1962;  Orr  &    Friedman,  1964).   Wood  (19657,  dealt  with 
the  problems  inherent  in  assessing  the  listening  comprehension  of  young  children 
by  scoring  their  responses  to  imperative  statements,  compressed  in  time  by 
various  amounts,  such  as  "buzz  like  a  bee". 

Some  measures  of  listening  comprehension  may  be  more  sensitive  than 
others.   Bellamy  (I966),  in  comparing  the  performance  of  a  blind  and  a  sighted 
group  with  respect  to  the  ability  to  comprehend  accelerated  listening  selections, 
used  both  a  multiple  choice  test  and  an  interview  technique  and  reported  that  the 
interview  technique  revealed 'a  difference  in  favor  of  the  blind  group,  not 
detected  by  the  multiple  choice  test.   Friedman,  Orr,  Freedel  and  Norris  (1966) 
used  short  answer  and  essay  tests  of  the  comprehension  of  accelerated  speech 
and  found  no  discernable  trends  in  performance  as  a  function  of  practice  in 
listening  to  such  speech.   A  multiple  choice  test,  on  the  other  hand,  revealed 
considerable  improvement .They  also  found  a  lack  of  correlation  between  the 
results  of  short  answer  and  essay  tests.   Perhaps  the  superiority  of  a  recognition 
test  over  a  recall  test  in  detecting  differences  in  the  comprehension  of 
accelerated  speech  is  a  consequence  of  the  retrieval  problems  arising  from  the 
incomplete  encoding  of  stimulus  material  in  partially  learned  tasks.   Robinson 
(1966  a,  b  &  c)  offers  evidence  in  support  of  the  view  that  memory  involves 
encoding  of  information  in  such  a  way  as  to  make  retrieval  possible.   An 
incompletely  encoded  message  might  be  released  by  stimuli  in  a  recognition 
test  but  unret r i evab 1 e  in  a  recall  task. 

The  two  general  approaches  just  presented  have  tended  to  yield  different 
results.   Increasing  the  amount  of  time  compression  appears  to  have  a  smaller 
influence  on  intelligibility  (Garbey,  1953a)  than  on  comprehension  (Foulke,  et  a  1 . 
1962).   The  diversity  of  procedures  used  in  the  study  of  accelerated  speech 
argues  for  a  generous  measure  of  caution  in  comparing  the  results  of  different 
experiments.   But  the  consistency  of  findings,  in  spite  of  such  diversity, 
lends  credence  to  the  relationships  just  mentioned.   The  lack  of  complete 
agreement  in  the  results  produced  by  the  two  approaches  in  evaluating  the 
effects  of  time  compression  may  be  more  than  an  artifact  of  the  method  used. 
It  may  be  that  such  disagreement  is  a  reflection  of  differences  in  the  mediating 
processed  underlying  the  behaviors  on  which  the  two  kinds  of  scores  depend.   If 
so,  thorough  evaluation  will,  of  course,  require  both  approaches.   This  thesis 
will  be  developed  more  fully  in  a  later  section  of  the  paper,  (see  A  Two  Process 
Hypothesis  Regarding  the  Comprehension  of  Time  Compressed  Speech,  pg.  19  ). 


Factors  Affecting  The  Intelligibility  of 
Time  Compressed  Speech 

Factors  that  have  been  shown  to  have  an  effect  upon  the  intelligibility 
of  time  compressed  speech  can  be  divided  into  two  general  classes.   One  class 
includes  stimulus  variables  associated  with  the  context  in  which  the  signal  to 
be  identified  is  presented,  and  characteristics  of  the  signal  itself.   The 
second  class  includes  organizmic  variables,  such  as  the  listener's  age,  sex, 
intelligence,  and  prior  experience  with  stimulus  material.   In  this  section, 
the  research  relating  to  the  characteristics  of  this  signal  and  to  the 
listener's  familiarity  with  stimulus  words  is  reviewed. 

Characteristics  of  the  Signal 

(1)  The  Method  of  Compression 

The  intelligibility  of  time  compressed  words  depends  upon  the  method 
used  for  compression.   When  the  speed  changing  method  is  used,  a  compression  in 
time  of  approximately  33%,  results  in  a  loss  in  intelligibility  of  k0%   or  more 
(Fletcher,  1929;  Garvey,  1953a;  Klumppfc  Webster,  196l)„   On  the  other  hand, 
Garvey  (1953a)  found  only  a  10%  loss  in  the  intelligibility  of  words  compressed 
60%  in  time  by  his  manual  sampling  method,  and  a  50%  loss  in  intelligibility 
at  75%.   Kurtzrock  (1957),  using  an  electromechanical  sampling  method,  found 
50%  intelligibility  for  monoslyabic  words  presented  at  a  compression  of  85%. 
Using  a  similar  method  and  similar  materials,  Fairbanks  and  Kodman  (1957)  obtained 
50%  intelligibility  at  a  compression  of  87%. 

Compression  by  either  method  increases  the  rate  at  which  the  d i scr imi nab le 
elements  of  speech  occur.   However,  whereas  vocal  pitch  is  unaffected  by  the 
sampling  method,  it  is  elevated  by  the  speed  changing  method.   The  difference 
in  the  intelligibility  of  words  compressed  by  the  two  methods  is  probably  due 
to  the  distortion  in  vocal  pitch,  since  this  is  the  factor  that  is  not  common 
to  the  two  methods . 

(2)  Intelligibility  and  Sampling  Rule 

The  sampling  period  of  speech  that  is  to  be  compressed  in  time  by 
the  sampling  method  is  the  interval  between  the  onsets  of  consecutive  retained 
portions  of  the  message.   Compression  is  accomplished  by  discarding  part  of 
this  interval.   It  is  the  ratio  of  the  retained  to  the  discarded  portions  of 
sampling  periods  that  determines  the  amount  of  compression.   If  ten  milliseconds 
of  a  twenty  millisecond  sampling  period,  or  thirty  milliseconds  of  a  sixty  milli- 
second sampling  period  are    retained,  the  result  is  the  same  --  50%  compression. 
For  any  given  sampling  period,  changing  the  ratio  of  retained  to  discarded 
portions  changes  the  amount  of  compression. 

When  the  sampling  method  is  used,  the  effect  that  a  given  amount  of 
compression  will  have  on  the  intelligibility  of  words  depends  upon  the  duration 
of  the  discarded  portion  of  the  sampling  period,  and  hence  upon  the  duration  of 
the  sampling  period  itself.   The  duration  of  the  discarded  portion  of  the 
sampling  period  must  be  short  relative  to  the  durations  of  the  speech  sounds  to 
be  sampled.   If  it  is  not,  a  speech  sound  may  f a  11  entirely  within  the  discarded 
portion  of  a  sampling  period  and  in  which  case  it  would  not  be  sampled  at  all. 
With  spondaic  words  compressed  to  50%  of  their  original  durations.  Garvey  (1953a), 
using  discard  intervals  of  kO   milliseconds,  60  msec,  80  msec,  and  100  msec, 
found  corresponding  intelligibility  scores  of  95.33%,  95.67%,  95%,  and  85.67%. 
In  a  two  factor  experiment  in  which  five  discard  intervals  and  eight  compressions 
were  represented,  Fairbanks  and  Kodman  (1957)  also  found  a  substantial  loss  in 


intelligibility  when  the  duration  of  the  discard  interval  exceeded  80  msec. 
This  was  true  at  all  eight  compressions. 

The  intelligibility  of  a  word  may  be  degraded  if  the  word  is  sampled 
too  frequently.   Speech  that  i.s  compressed  in  time  by  the  sampling  method 
consists  of  a  succession  of  abutted  samples  of  the  original  speech.   If  the 
transitions  from  sample  to  sample  in  this  succession  occur  with  sufficient  fre- 
quency, the  result  is  an  audible  tone  wi'th  definite  pitch.   If  the  sampling 
rate  is  high  enough,  the  pitch  of  this  tone  will  intrudeinto  the  speech  spectrum 
and  mask  some  speech  frequencies.   Fairbanks  and  Kodman  (1957),  using  a  discard 
interval  of  10  msec,  found  90%  intelligibility  for  words  compressed  to  20% 
of  their  original  durations.   When  this  discard  interval  was  changed  to  kO   msec, 
they  found  9^+%  intelligibility.   When  a  10  msec,  discard  interval  is  used  to 
compress  speech  to  20%  of  its  original  time,  the  retained  samples  are   2.5  msec 
in  duration  and  they  occur  at  a  rate  of  400  per  second.   The  400  cycle  tone 
corresponding  to  this  rate  is  well  within  the  speech  spectrum  and  might  be 
expected  to  interfer  with  intelligibility.   If,  on  the  other  hand,  a  kO   msec 
discard  interval  is  used  in  compressing  speech  to  20%  of  its  original  duration, 
the  retained  samples  are    10  msec  in  length  and  they  occur  at  a  rate  of  100 
per  second.   The  audible  tone  of  corresponding  frequency  is  below  the  speech 
spectrum  in  this  case,  and  there  should  be  little  interference. 

Cramer,  ( 1 965 )  reports  that  when  Ss  use  earphones  to  listen  to  speech 
that  has  been  compressed  in  time  by  the  sampling  method,  delaying  the  signal 
to  one  earphone  by  7-5  msec  improves  intelligibility.   This  delay  provides 
what  Cramer  has  called  "binaural  redundancy".   If,  as  Garvey,  (1953b),  suggests 
it  is  the  briefness  of  highly  compressed  speech  sounds  that  makes  them  un- 
intelligible, "binaural  redundancy"  may  increase  the  effective  duration  of 
such  sounds.   Scott  (19&5)  reports  a  favorable  result  when  S_s  use  one  earphone 
to  listen  to  the  normally  retained  samples  of  time  compressed  speech  and  the 
other  earphone  to  listen, at  the  same  time,  to  the  normally  discarded  samples 
of  time  compressed  speech.   He  refers  to  such  speech  as  "dichotic  speech".   As 
yet,  there  are    no  data  on  which  to  base  a  comparison  of  "dichotic  speech"  and 
conventional  sampled  speech. 

(3)   Intelligibility  and  the  Rate  Of  Occurance  of  Speech  Sounds 

Garvey  (1953a)  compared  the  intelligibility  of  words  compressed  in  time 
by  the  sampling  method  with  the  intelligibility  reported  by  Miller  and  Licklider 
(1950)  for  words  that  had  been  interrupted  periodically.   Garvey's  words  and 
Miller  and  Licklider's  words  were  treated  alike  in  that  portions  of  sampling 
periods  were  discarded.   However,  the  retained  samples  of  Garvey's  words  were 
abutted  to  produce  time  compressed  speech,  while  the  retained  samples  of  Miller 
and  Licklider's  words  were  not  abutted  and  the  resulting  speech,  though 
interrupted,  was  not  compressed  in  time.   There  was  no  difference  between  the 
intelligibility  of  time  compressed  words  and  interrupted  words  when  50%  of 
each  word  was  discarded.   However,  when  62%  of  each  word  was  discarded,  interrupted 
words  were  k0%   more  intelligible  than  time  compressed  words.   Since  the  two  groups 
of  words  were  alike  with  respect  to  the  amount  of  speech  information  that  had 
been  discarded,  the  poorer  intelligibility  of  the  time  compressed  words  when 
62%  of  the  speech  information  was  discarded  was  probably  due  to  the  accelerated 
rate  of  occurance  of  speech  sounds. 

(k)       Intelligibility  and  Word  Structure 

Kurtzrock  (1957)  found  that  compression  by  the  speed  changing  method 
degraded  the  intelligibility  of  vowel  sounds  more  than  consonantal  sounds,  and 
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that  compression  by  the  sampling  method  degraded  consonantal  sounds  more  than 
vowel  sounds,   Garvey's  Ss  (1953b)  rated  the  vowel  sounds  in  words  that  had 
been  compressed  in  time  by  the  sampling  method  higher  in  "goodness"  than 
consonantal  sounds,,   In  a  study  in  which  the  number  of  phonemes  in  a  word 
was  varied  from  three  to  nine,  Henry  ( 1 966)  found  that  increasing  the  number  of 
phonemes  improved  the  intelligibility  of  words  that  had  been  compressed  in  time 
by  the  sampling  method.   In  a  similar  vein,  Klumppand  Webster  (19&0  found 
short  phrases  compressed  in  time  by  the  speed  changing  method  to  be  more 
intelligible  than  single  words.   The  findings  of  Henry  and  of  Klumppand 
Webster  are    probably  explained  by  the  cues  the  Ss  can  derive  from  the  contexts 
of  mu 1 t i phonemi c  words  and  short  phrases. 

Characteristics  of  the  Organisam 

(1)  Intelligibility  and  Prior  Experience 

Using  the  sampling  method,  Fairbanks  and  Kodman  (1957)  found  a  group 
of  highly  compressed  words  to  be  more  intelligible  than  a  group  of  the  interrupted 
words  of  Miller  &  Licklider  when  the  two  groups  were  equated  with  respect  to 
the  amount  of  speech  information  that  was  discarded.   However,  the  Ss  of  Fairbanks 
and  Kodman  had  received  extensive  familiarization  with  the  words  to  be  identified 
before  tests  were  made,  whereas  the  _Ss  of  Miller  and  Licklider  were  relatively 
naive.   Miller  and  Licklider  (1950),  using  interrupted  words,  and  Garvey  (1953b), 
using  words  compressed  in  time  by  the  sampling  method,  found  that  repeated 
exposure  to  such  words  improved  their  intelligibility. 

If  a  group  of  listeners  agree  that  a  particular  speech  sound  in  a  word 
that  has  been  compressed  in  time  by  the  sampling  method  is  unrecognizable,  it 
may  fairly  be  concluded  that  the  difficulty  lies  with  the  signal  itself.   However, 
Garvey  found  that  _Ss  disagreed  about  the  speech  sounds  that  were  rendered 
unintelligible  by  compression  of  the  words  in  which  they  occured.   Garvey 
explained  this  finding  in  terms  of  the  differential  experience  of  S_s  with  respect 
to  the  words  in  question.   In  this  connection,  Henry  ( 1 966)  found  words  which  occur 
with  greater  frequency  in  general  language,  as  indicated  by  the  Lorge-Thornd i ke 
count,  to  be  more  intelligible. 

(2)  Intelligibility  and  Anatomical  Damage 

The  intelligibility  of  time  compressed  speech  is  influenced  by  hearing 
capacity.   Calearo  and  Lazzaroni  (1957),  using  Ss  with  normal  hearing,  determined 
the  intensity  required  for  threshold  intelligibility  of  short  sentences  presented 
at  \k0 ,    250,  and  350  wpm.   With  each  increase  in  word  rate,  a  10  db  increase  in 
intensity  was  required  in  order  to  maintain  threshold  intelligibility.   When 
elderly  patients  with  presbycusis,  and  patients  with  temporal  lobe  turners  were 
given  the  same  test,  increases  in  word  rate  were  accompanied  by  much  greater 
losses  in  intelligibility.   In  a  test  of  patients  with  hearing  losses  due  to 
peripherial  damage,  deQuiros  ( 1 9 6^+ )  found  intelligibility  thresholds  resembling 
those  of  Calearo's  normal  Ss  at  1^+0  and  250  wpm,  but  an  elevated  threshold  at 
350  wpm.   Thus,  it  appears  that  the  extent  to  which  the  intelligibility  of  t' 
compressed  speech  is  influenced  by  loss  of  hearing  capacity  depends  upon  the 
kind  of  underlying  anatomical  damage. 

The  Comprehension  of  Time  Compressed  Speech 

As  in  the  case  of  intelligibility,  the  comprehension  of  time  compressed 
speech  is  influenced  by  factors  relating  to  the  listener  and  to  the  stimulus. 
Factors  relating  to  the  listener  include  such  variables  as  age,  sex,  and  intelligence 
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Stimulus  factors  include  such  variables  as  the  amount  and  method  of  compression, 
and  the  characteristics  of  an  oral  reader's  voice.   Of  course,  to  the  extent  that 
comprehension  of  connected  discourse  depends  upon  the  intelligibility  of  individual 
words,  variables  affecting  word  .  i nte  1  1 i g i b i 1 i ty  will  also  affect  comprehension. 

S  t  i  mu 1  us  Var  iab les . 

(l)   Comprehension  and  the  Amount  of  Compression  in  Time 

When  speech  is  compressed  in  time,  the  principle  effect  is  the  acceleration 
of  the  rate  at  which  words  occur.   There  are    several  studies  in  which  comprehension 
has  been  measured  as  a  function  of  word  rate  but,  in  each  of  these  studies,  word 
rate  has  been  varied  through  a  relatively  limited  range.  Therefore,  in  order  to 
gain  an  impression  of  the  influence  of  this  variable,  it  is  necessary  to  combine 
the  results  of  several  studies 

Within  the  range  from  126  to  172  wpm ,  Diehl,  et_  aj_.  ,  (1959)  found 
listening  comprehension  to  be  unaffected  by  changes  in  word  rate.   In  the  range 
from  125  to  225  wpm,  Nelson  ( 1 948)  and  Harwood  (1955)  found  a  slight  but  in- 
significant loss  in  comprehension  as  word  rate  was  increased.   Fairbanks,  et  a  1 . , 
(1957a)  found  little  difference  in  the  comprehension  of  listening  selections 
presented  at  l4l ,  201,  and  282  wpm.   Thereafter,  comprehension,  as  indicated  by 
percent  of  test  questions  correctly  answered,  declined  from  58%  correct  at  282 
wpm  to  26%  at  ^70  wpm.   Fou 1 ke ,  et  al.,  (1962),  using  both  technical  and  literary 
listening  selections,  found  comprehension  to  be  only  slightly  affected  by  increases 
in  word  rate  up  to  275  wpm.   However,  in  the  range  from  275  to  375  wpm,  they  found 
an  accelerated  decrease  in  comprehension  as  word  rate  was  increased.   Fou 1 ke  &  Sticht 
(in  press),  using  the  STEP  Listening  Test  Form  1A,  found  a  decrease  in  comprehension 
of  6%  between  225  and  325  wpm,  and  a  decrease  in  comprehension  of  )k%   between  325 
and  A-25  wpm.   The  three  studies  just  cited  are    in  agreement  regarding  the  finding 
that  there  is  a  change  in  the  rate  at  which  comprehension  declines  as  word  rate 
is  increased.   A  similar  relationship  has  also  been  found  in  many  other  studies 
in  which  the  determination  of  the  influence  of  word  rate  upon  comprehension  was 
not  the  primary  objective,  (Foulke,  1966c). 

The  relationship  revealed  from  the  studies  reviewed  in  this  section  is 
one  in  which  comprehension,  as  indicated  by  test  scores,  decreases  as  word  rate 
or  the  amount  of  compression  is  increased.   However,  outcome  measures  based  upon 
test  performance  alone  do  not  take  into  account  the  learning  time  that  is  saved 
when  speech  is  presented  at  an  accelerated  word  rate.   Such  an  allowance  may  be 
made  by  dividing  the  comprehension  score  by  the  time  required  to  present  the 
message.   This  index  of  learning  efficiency  expresses  the  amount  of  learning  per  unit 
time,   Using  such  an  index,  Faiibanks,  e_t_  a_l_.  ,  (1957a),  Enc  and  Stolurow  (i960), 
and  Foulke,  e_t  a_l_.  ,  ( 1 962)  found  that  learning  efficiency  increased  as  word  rate 
was  increased  up  to  approximately  280  wpm,  and  remained  constant  with  further  increases 
in  word  rate.   Thus,  although  one  who  listens  to  a  selection  presented  at  325  wpm 
may  not  be  able  to  demonstrate  as  much  comprehension  as  one  who  listens  at  a  normal 
rate,  he  may  be  learning  more  per  unit  time.   Using  the  same  logic,  Enc  and  Stolurow 
(i960)  have  computed  an  index  of  the  efficiency  of  retention. 

The  word  rate  at  which  a  listening  selection  is  presented  apparently  has 
no  special  effect  on  the  rate  at  which  forgetting  occurs.   Enc  and  Stolurow  (i960), 
Friedman,  Orr,  Freedel  and  Norris  (1966),  and  Foulke  (1966c)  have  performed  studies 
in  which  tests  of  the  comprehension  of  listening  selections  presented  at  several 
word  rates  have  been  made  after  several  retention  intervals.   In  general,  these 
studies  support  the  conclusion  that  differences  in  the  course  of  forgetting  are 
due  to  differences  in  original  learning.   Of  course,  as  has  already  been  shown, 
the  amount  of  original  learning  is  a  function  of  the  word  rate  at  which  a  listening 
selection  is  presented. 
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(2)  Comprehension  and  the  Method  of  Compression 

McLain  (1962)  and  Fou  I  ke  (1962),  using  S_s  who  were  naive  with  respect 
to  compressed  speech  and  unaccustomed  to  reading  by  listening,  compared  the 
comprehension  of  speech  compressed  by  the  sampling  method  with  the  comprehension 
of  speech  compressed  by  the  speed  changing  method,   In  both  instances,  a  slight 
but  statistically  significant  advantage  was  found  for  the  sampling  method. 
However,  in  a  similar  experiment  in  which  blind  school  children,  who  were 
accustomed  to  reading  by  listening,  served  as  .Ss ,  Fou  1  ke  (1966a)  found  no 
statistically  significant  difference  in  favor  of  either  method.   The  conclusion 
suggested  by  the  research  just  cited  is  that  the  relative  superiority  of  the 
sampling  method  is  slight,  and  that  this  superiority  may  be  erased  by  experience 
in  reading  by  listening.   However,  there  are  not  enough  data  to  provide  firm 
support  for  any  conclusions.   In  the  studies  mentioned  here,  all  Ss  have  been 
naive  with  respect  to  time  compressed  speech  of  any  kind,  the  Ss  have  had 
only  brief  exposures  to  time  comp  essed  listening  selections,  and  tests  have 
been  made  in  a  limited  range  of  word  rates.   There  can  be  little  disagreement 
among  listeners  about  which  kind  of  compressed  speech  results  in  a  more 
agreeable  listening  experience.   Whereas  compression  by  the  speed  changing 
method  results  in  serious  distortion  of  vocal  quality  that  renders  the  speaker's 
voice  unrecognizable,  compression  by  the  sampling  method  preserves  vocal 
quality  so  that,  even  at  very  fast  word  rates,  a  listener  can  often  identify 
a  speaker  with  whose  voice  he  is  familiar. 

The  issue  at  stake  is  important  for  both  practical  and  theoretical 
reasons.   Practically  speaking,  the  equipment  required  for  the  compression  of 
speech  by  the  speed  changing  method  is  readily  available  and  cheap  enough  so 
that  individual  ownership  of  compressors  is  feasible.   On  the  other  hand,  the 
equipment  required  for  the  time  compression  of  speech  by  the  sampling  method 
is  scarce  and  very  expensive.   If  it  is  not  possible  to  demonstrate  an  obvious 
advantage  with  respect  to  comprehension,  for  speech  compressed  by  the  sampling 
method,  the  mere  fact  that  the  product  of  the  sampling  method  is  more  pleasing 
to  hear  may  not  be  sufficient  to  justify  the  added  expense  it  entails. 

From  a  more  theoretical  point  of  view,  the  finding  that  the  obvious 
superiority  of  the  sampling  method,  when  the  comparison  is  based  on  the 
intelligibility  of  single  words,  cannot  be  demonstrated  when  the  comparison 
is  based  on  the  comprehension  of  connected  discourse,  is  quite  interesting. 
It  suggests  that  some  other  factor,  such  as  the  rate  at  which  words  occur, 
is  relatively  more  important  than  the  intelligibility  of  single  words  in 
determining  the  comprehension  of  time  compressed  speech.   This  suggestion, 
if  substantiated  experimentally,  has  important  implications.   The  intelligibility 
of  a  single  compressed  word  should  be,  in  large  part,  a  function  of  the  signal 
quality  of  the  equipment  used  in  compressing  it.   If,  beyond  a  certain  point, 
the  intelligibility  of  single  words  becomes  relatively  unimportant,  then  little 
gain  can  be  expected  from  efforts  directed  at  further  refinement  of  speech 
compression  equipment.   Attention  will  be  directed  more  appropriately  to  a 
consideration  of  the  perceptual  and  cognitive  processes  of  the  listener  who 
must  contend  with  accelerated  speech. 

(3)  The  Nature  of  the  Material  to  be  Comprehended 

The  way  in  which  the  level  of  difficulty  of  a  listening  selection  and 
the  word  rate  at  which  it  is  presented  interact  to  produce  a  given  comprehension 
score  may  depend  upon  the  formula  used  for  estimating  difficulty.   There  are 
several  schemes  for  estimating  the  difficulty  of  reading  selections  (Dale  & 
Chall,  1 9^+8 ;  Flesch,  1 9^+8)  ,  and  it  has  generally  been  assumed  that  these 
schemes  are    equally  valid  for  determining  the  difficulty  of  listening  selections. 


Different  schemes  often  produce  different  estimates  of  difficulty,  and  no  study 
has  been  found  in  which  different  schemes  have  been  compared  with  respect  to 
their  ability  to  select  the  kind  of  material  that  will  be  most  affected  by 
time  compression.   The  evidence  currently  available  comes  from  studies  that 
cannot  safely  be  compared  because  they  have  made  use  of  different  listening 
selections,  subject  populations,  and  have  explored  different  ranges  of  the 
word  rate  variable. 

Nelson  ( 1 9^+8)  and  Harwood  (1955)  found  the  comprehension  of  a  listening 
selection  rated  difficult  by  the  Flesch  Formula  to  be  more  adversely  affected 
by  increasing  the  word  rate,  within  the  range  from  125  to  225  wpm ,  than  a 
listening  selection  rated  relatively  less  difficult  by  the  same  formula.   Enc 
and  Stolurow  (I960)  found  considerable  variability  in  mean  comprehension  test 
scores  for  ten  reading  selections  presented  at  a  normal  word  rate  and  at  a 
slightly  accelerated  word  rate,  in  spite  of  the  fact  that  they  were  rated  as 
equal  in  difficulty  by  the  Dale-Chall  Formula.   Using  one  normal  and  four 
accelerated  word  rates,  Fou  1  ke ,  e_t_  aj_.  ,  (1962),  measured  the  listening 
comprehension  of  a  scientific  selection  and  a  literary  selection.   In  spite  of 
the  fact  that  the  two  selections  were  rated  as  equal  in  difficulty  by  the  Dale- 
Chall  Formula,  comprehension  of  the  scientific  selection  was  generally  poorer 
than  comprehension  of  the  literary  selection.   Although  there  was  a  significant 
interaction  between  word  rate  and  the  nature  of  the  listening  selection,  it 
was  probably  due  to  the  fact  that  since  there  was  poorer  comprehension  of 
the  scientific  selection  than  the  literary  selection  at  a  normal  word  rate, 
there  was  a  reduced  range  within  which  comprehension  scores  of  the  scientific 
selection  could  vary. 

The  estimate  of  reading  difficulty  obtained  with  the  Flesch  Formula 
is  largely  determined  by  word  and  sentence  length,  whereas  the  estimate  obtained 
with  the  Dale-Chall  Formula  depends  primarily  upon  the  number  of  words  not 
found  in  Dale's  lists  of  words  easily  understood  at  various  grade  levels. 
In  view  of  these  differences,  it  is  not  surprising  that  the  formulas  often 
produce  different  estimates.   The  finding  of  a  systematic  interaction  between 
difficulty  and  word  rate  for  listening  selections  rated  different  in  difficulty 
by  a  particular  formula  would  provide  a  kind  of  rational  validity  for  that 
f ormu la . 

Rodgers  (19&2)  has  suggested  a  procedure  for  estimating  the  difficulty 
of  listening  selections  that  takes  into  account  the  average  idea  length, 
which  is  found  by  dividing  the  number  of  words  in  a  listening  selection  by 
the  number  of  independent  clauses  in  that  selection.   The  effect  of  increasing 
word  rate  upon  the  comprehension  of  listening  selections  at  several  levels 
of  difficulty,  as  determined  by  Rodgers1  method,  has  not  yet  been  explored. 

As  previously  mentioned,  it  has  generally  been  assumed  that  reading 
difficulty  and  listening  difficulty  are  the  same.  However,  this  assumption 
may  not  be  justified,  and  the  required  experimental  comparison  would  appear 
to  be  a  simple  matter. 

(4)   The  Reader's  Style  and  Vocal  Quality 

Oral  readers  differ  considerably  with  respect  to  vocal  timbre  and,  of 
course,  there  are   conspicuous  sex  differences  in  vocal  pitch.   Oral  readers 
also  differ  with  respect  to  factors  such  as  average  word  rate,  and  variation 
in  word  rate,  pitch,  and  loudness,  that  combine  to  define  the  personal  oral 
reading  style.   In  a  preliminary  investigation,  Foulke(l964)  explored  the 
extent  to  which  these  factors  interact  with  word  rate  in  determining  listening 
comprehension.   An  experiment  was  performed  in  which  three  versions  of  a 
listening  selection,  each  read  by  a  different  reader  (two  male  and  one  female), 
were  presented  to  groups  of  college  students  at  a  normal  word  rate  and  a  word 
rate  that  was  accelerated  by  the  sampling  method.   There  were  significant 
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differences  in  comprehension  associated  with  the  reader  and  with  the  word 
rate.   However,  the  reader's  effect  on  the  comprehension  of  the  listening 
selection  did  not  depend  upon  the  word  rate.   The  results  of  this  experiment 
were  inconclusive,  because  of  the  small  number  of  readers  used.   There  is 
general  agreement  among  those  who  have  had  extensive  experience  in  listening 
to  speech  compressed  in  time  by  the  sampling  method  that  some  voices  and  reading 
styles  withstand  the  ravages  of  compression  better  than  others. 

Listener  Variables  That  Affect  Comprehension 

Some  listeners  show  good  comprehension  of  speech  presented  at  a  rate 
of  350  wpm  or  faster,  with  little  or  no  prior  experience  in  listening  to  such 
speech.   Other  listeners  show  poor  comprehension  of  accelerated  speech,  even 
after  prolonged  exposure  to  such  speech  (Foulke,  1964).   These  marked  and 
persistent  individual  differences  are   undoubtedly  the  consequence  of  the 
interaction  of  a  host  of  organismic  variables.   An  effort  has  been  made  to 
clarify  the  contribution  of  such  variables  in  determining  listening  comprehension 
and  accelerated  word  rates,  but  it  is  only  a  beginning  effort  and  a  good 
deal  of  research  will  be  required  before  this  class  of  variables  can  be 
taken  into  account  properly. 

(1)  The  Sex  of  the  Listener 

Comparisons  of  the  comprehension  test  scores  of  male  and  female 
listeners  have  revealed  no  sex  related  differences  in  comprehension  for  word 
rates  varying  from  17^  to  ^75  wpm,  (Foulke  £■  Sticht,  1967;  Orr  £•  Friedman,  1965). 

(2)  The  Listener's  Age  and  Educational  Experience 

In  the  research  relevant  to  this  topic,  school  children  have  served 
as  Ss ,  and  their  age  and  amount  of  education  have,  of  course,  varied  con- 
comitantly.  Therefore,  the  outcome  of  such  experiments  cannot,  strictly  speaking, 
be  related  to  either  age  or  amount  of  education  alone.   Fergen  ( 1 95^+)  and  Wood 
(1965)  found  a  positive  relationship  between  the  grade  level  of  school  children 
and  the  comprehension  of  accelerated  speech.   Together,  their  experiments 
included  grades  one,  three,  four,  five,  and  six.   Since  the  Ss '  task  in  Wood's 
experiment  was  to  carry  out  the  instructions  communicated  by  short  imperative 
statements,  his  measurements  probably  pertained  more  to  intelligibility  than 
to  comprehens  ion . 

High  school  and  college  students  have  served  as  Ss  in  many  experiments 
in  which  listening  comprehension,  as  a  function  of  word  rate,  has  been  determined. 
However,  due  to  different  experimental  materials  and  conditions,  these  experiments 
cannot  safely  be  compared  with  a  view  to  determining  the  effects  of  age  and 
education  on  the  comprehension  of  accelerated  speech.   The  ability  of  aged  S_s 
to  comprehend  accelerated  listening  selections  has  not  been  determined.   However, 
the  results  obtained  by  Calearo  and  Lazzaroni  (1957)  in  testing  ages  Ss  for 
the  intelligibility  of  short  time  compressed  sentences,  are    suggestive.   One 
might  reasonably  expect  a  relatively  large  decline  in  the  ability  of  aged  Ss 
to  comprehend  accelerated  speech.   Of  course,  such  a  decline  would  not  be  due 
to  age  per  se,  but  to  the  involutional  changes  in  the  central  nervous  system 
accompanying  old  age. 

(3)  The  Intelligence  of  the  Listener 

In  the  case  of  children,  the  evidence  presently  available  is  not  sufficient 
to  permit  a  conclusion  regarding  the  effect  of  intelligence  on  the  comprehension 
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of  accelerated  speech.   Fergen  (1954)  found  no  relationship  between  the  IQ's 
of  grade  school  children  and  their  ability  to  comprehend  accelerated  listening 
selections.   However,  230  wpm  was  the  fastest  word  rate  represented  in  her 
experiment.   Wood  (1965)  found  no  relationship  between  IQ.  and  the  ability  to 
follow  instructions  communicated  by  short  time  compressed  imperative  statements. 
However,  as  previously  mentioned,  his  procedures  resemble  more  closely  those 
used  in  testing  for  intelligibility. 

A  more  definite  conclusion  is  possible  in  the  case  of  adults,  Fairbanks 
ejt  aj_.  (1957a,  1957b),  Goldstein  (1940),  and  Nelson  (1948),  have  all  found  a 
positive  relationship  between  intelligence  and  the  ability  to  comprehend 
accelerated  speech.   The  data  of  Fairbanks  et  aj_.  (1957a)  and  of  Goldstein 
(1940)  concur  in  showing  a  positive  relationship  between  the  intelligence  of 
the  listener  and  the  magnitude  of  the  decline  in  listening  comprehension  as  word 
rate  is  increased.   This  relationship  is  undoubtedly  due  to  the  fact  that  since 
intelligent  S_s  earn  higher  scores  on  the  tests  of  comprehension  of  listening 
selections  that  have  been  presented  at  a  normal  word  rate  in  order  to  provide 
a  basis  for  comparison,  the  scores  they  earn  on  the  tests  of  comprehension  of 
listening  selections  presented  at  accelerated  word  rates  have  a  larger  range 
within  which  to  vary.   Those  of  lower  intelligence  perform  nearer  to  the  chance 
level  with  normal  rates,  and  can  persist  with  chance  level  performance  over  a 
wide  range  of  word  rates. 

(4)  The  Visual  Status  of  the  Listener 

There  are  a  priori  grounds  for  expecting  blind  individuals  to  show 
better  listening  comprehension  than  sighted  individuals.   In  general,  blind 
people  depend  to  a  much  greater  extent  than  sighted  people  upon  aural  communication 
Increasingly,  blind  students,  and  other  blind  people  who  read,  do  so  by  listening 
to  recorded  books.   The  practice  afforded  by  such  experience  might  be  expected 
to  improve  listening  ability  and  this  improved  ability  should  be  advantageous 
to  listening  to  accelerated  speech  as  well.   Furthermore,  whereas  accelerated 
speech  may  be  little  more  than  a  curiosity  to  the  average  sighted  person,  it 
may  be  perceived  by  the  blind  person  as  a  potential  solution  to  the  serious 
reading  problem  he  experiences  by  virtue  of  the  slow  rate  at  which  he  reads 
ordinarily.   When  such  a  person  serves  as  a  S  in  an  experiment  in  which  the 
comprehension  of  accelerated  speech  is  measured,  he  might  be  expected  to  maintain 
a  more  attentive  adjustment. 

The  research  related  to  this  question  is  meager  and  the  results  are 
conflicting.   Fou 1 ke  (1964)  offered  evidence  for  superior  comprehension  by 
blind  Ss  of  time  compressed  listening  selections.   In  a  direct  comparison, 
Bellamy  ( 1 966)  found  no  difference  between  blind  and  sighted  Ss  with  respect 
to  the  comprehension  of  accelerated  listening  selections.   Furthermore,  in  an 
experiment  performed  by  Hartlage  (1963)  blind  and  sighted  S_s  did  not  differ 
with  respect  to  their  comprehension  of  listening  selections  presented  at  a 
normal  word  rate. 

(5)  Reading  Rate  and  Listening  Rate 

Those  perceptual  and  cognitive  factors,  whatever  they  may  be,  that 
are    responsible  for  individual  differences  in  reading  rate,  may  also  be  responsible 
for  individual  differences  in  the  ability  to  comprehend  accelerated  speech.   If 
this  is  true,  fast  readers  should  be  able  to  comprehend  speech  at  a  faster  word 
rate  than  slow  readers.   This  hypothesis  has  been  tested  by  Goldstein  (1940) 
and  by  Orr,  Friedman,  and  Williams  (1964).   In  both  experiments  a  significant 
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positive  correlation  was  found  between  reading  rate  and  the  ability  to  comprehend 
accelerated  speech.   In  both  experiments,  it  was  also  found  that  practice  in 
listening  to  accelerated  speech  resulted  in  an  improvement  in  reading  rate. 
This  finding  adds  further  support  to  the  hypothesis  that  the  two  performances 
in  question  may  be  mediated,  at  least  in  part,  by  the  same  underlying  factors. 
Nelson  (  1 9^+8)  found  no  correlation  between  reading  rate  and  the  ability  to 
comprehend  accelerated  speech.   However,  his  measures  of  reading  rate  were 
taken  from  college  entrance  examination  data  collected  some  time  prior  to  his 
study,  while  Goldstein  and  Orr,  e_t_  a_l_.  ,  obtained  their  measures  of  reading  rate 
during  their  investigations. 

Goldstein  (iS^+O)  and  Jester  and  Travers  ( 1 965 )  compared  the  comprehension 
resulting  from  listening  to  selections  presented  at  several  word  rates  with 
the  comprehension  resulting  from  reading  the  same  selections,  presented  at  the 
same  word  rates.   In  both  cases,  comprehension  declined  as  word  rate  was  increased. 
Listening  comprehension  was  superior  to  reading  comprehension  up  to  approximately 
200  wpm .   Above  200  wpm ,  reading  comprehension  was  superior.   Simultaneous  reading 
and  listening  at  350  wpm  resulted  in  better  comprehension  than  could  be  demonstrated 
with  either  mode  of  presentation  alone.   This  finding  further  emphasizes  the 
compatibility  of  the  two  processes. 

(6)   Improving  the  Comprehension  of  Time  Compressed  Speech 

In  an  experiment  performed  by  Fairbanks,  e_t_  aj_.  (1957a)  a  mean  comprehension 
score  of  63.8%  was  obtained  by  Ss  who  listened  to  a  selection  presented  at  the 
uncompressed  rate  of  l4l  wpm.   (Percent  refers  to  the  fraction  of  items  answered 
correctly  on  the  test  of  comprehension.)   Compressing  this  selection  by  50% 
to  a  word  rate  of  282  wpm  resulted  in  a  mean  comprehension  score  of  58%,   With 
two  consecutive  presentations  of  the  selection  at  282  wpm,  the  mean  comprehension 
score  was  65.^%.   Though  the  Ss  who  served  in  this  condition  of  the  experiment 
did  not  save  any  listening  time,  the  two  exposures  did  result  in  slightly 
improved  comprehension. 

In  a  second  study  by  the  same  investigators  (Fairbanks,  e_t  aj_.  ,  1957c), 
elaborations  and  commentaries  were  written  for  selected  facts  in  a  listening 
selection.   The  recorded  version  of  this  elaborated  selection  was  then  compressed 
by  the  sampling  method  enough  so  that  its  playback  time  equalled  the  playback 
time  of  the  uncompressed  and  unelaborated  version.   The  objective  was  to  determine 
whether  or  not  comprehension  could  be  improved  by  trading  the  temporal  redundancy 
in  the  uncompressed  selection  for  the  verbal  redundancy  in  the  elaborated  selection. 
The  results  were  positive,  showing  better  comprehension  for  the  compressed  selection 
with  verbal  redundancy.   Analysis  of  test  results  indicated  decreased  comprehension 
for  those  portions  of  the  compressed  selection  that  had  not  been  elaborated. 
The  explanation  of  this  finding  is  probably  that  S_s  associated  verbal  redundancy 
with  importance  and  were  thus  relatively  less  attentive  to  the  unelaborated 
mater  ia 1  . 

Improving  the  comprehension  of  time  compressed  speech  by  a  training 
experience  of  some  sort  is  an  obvious  possibility,  and  several  investigators 
have  devised  and  evaluated  training  experiences.   The  simplest  and  least 
sophisticated  training  experience  that  has  been  evaluated  is  near  exposure. 
Voor  and  Miller  (1965)  exposed  S_s  to  five  brief  listening  selections  presented 
at  a  rate  of  380  wpm.   The  total  listening  time  was  17-5  minutes.   A  multiple 
choice  test  of  comprehension  followed  each  selection.   Comprehension  increased 
as  a  function  of  exposure  up  to  7  minutes,  and  remained  constant  thereafter. 
These  results  probably  reflect  a  simple  adjustment  to  the  initially  unfamiliar 
task  of  listening  to  accelerated  speech.   Fou 1 ke  (19&4)  gave  blind  school 
children,  with  considerable  experience  in  reading  by  listening,  over  25  hours 
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of  exposure  to  speech  at  a  rate  of  350  wpm.   S_s  were  tested  for  listening 
comprehension  at  350  wpm,  before  and  after  training,  with  equivilant  forms 
of  the  STEP  Listening  Test.   The  distribution  of  pre-training  and  post- 
training  STEP  Test  scores  were  not  significantly  different. 

Simple  exposure  may  have  failed  as  a  training  experience  because 
listeners  were  not  attending  to  the  training  material.   Therefore,  in  a  second 
condition  of  the  same  experiment,  a  comparable  group  of  Ss  received  the  same 
treatment  with  the  one  difference  that  the  listening  material  used  for  training 
was  interrupted  frequently  and  Ss  were  questioned  about  material  just  heard. 
It  was  hoped  that  the  demand  made  upon  Ss  by  this  procedure  would  promote  a 
more  attentive  attitude.   Again,  the  distributions  of  pre-training  and  post- 
training  STEP  Listening  Test  scores  were  not  significantly  different. 

A  further  modification  of  the  training  experience  was  represented  in 
the  two  remaining  conditions  of  the  experiment.   In  these  conditions,  training 
material  was  presented  initially  at  a  normal  word  rate.   As  training  progressed, 
the  wo  rd  rate  was  gradua 1 1 y  i  n creased  unt  i  1  ,  near  the  end  of  the  training  period, 
a  rate  of  350  wpm  had  been  reached.   In  one  of  these  conditions,  training 
material  was  presented  without  interruption.   In  the  other  condition,  training 
material  was  interrupted  for  questioning.   The  results  were  the  same.   In 
neither  case  were  there  differences  between  pre-training  and  the  post-training 
STEP  Listening  Test  scores  that  could  be  attributed  to  the  training  experience. 

In  an  experiment  reported  by  Orr,  Friedman  and  Williams  (}36^)    sighted 
S_s  ,  with  presumably  less  practice  in  reading  by  listening  than  Fou  1  ke '  s  blind 
S_s ,  were  given  a  training  experience  that  consisted  of  exposure  to  listening 
material  presented  initially  at  325  wpm,  and  increased  in  steps  of  25  wpm  over 
a  period  of  several  weeks  to  a  final  word  rate  of  475  wpm.   These  Ss  were  then 
tested  for  listening  comprehension  at  475  wpm  and  a  comparison  of  their  post- 
training  test  scores  with  equivalent  pre-training  test  scores  revealed  an 
improvement  in  comprehension  of  29 • 3% •   Since  the  Ss  in  this  experiment  had 
probably  not  had  extensive  practice  in  reading  by  listening,  and  since  there 
was  no  control  group  in  which  S_s  received  practice  in  listening  to  speech 
presented  at  a  normal  rate,  these  results  cannot  be  attributed  unequ i v i ca  1  1  y 
to  practice  in  listening  to  accelerated  speech.   The  improvement  that  was 
found  may  simply  have  been  a  consequence  of  practice  in  listening. 

Friedman,  Orr,  Freedle  and  Norris  (1966),  compared  the  comprehension 
test  scores  of  S_s  given  35  hours  of  mass  practice  to  listening  to  accelerated 
speech  with  comprehension  test  scores  of  Ss  given  12  to  14  hours  of  distributed 
practice  in  listening  to  accelerated  speech.   They  concluded  that  the  comprehension 
demonstrated  by  the  distributed  practice  group  was  as  good  or  better  than  the 
comprehension  demonstrated  by  the  massed  practice  group. 

It  is  clear  from  the  research  just  reviewed  that  an  adequate  training 
experience  for  improving  the  comprehension  of  accelerated  speech  has  yet  to 
be  found.   It  may  safely  be  concluded  that  simple  exposure,  at  least  in  the 
amount  so  far  tested,  is  not  adequate.   Exposure  to  speech,  the  word  rate  of 
which  is  slowly  increased,  may  have  some  benefit,  but  the  evidence  so  far 
available  will  not  support  a  definite  conclusion  in  this  regard.   Further 
research  is  clearly  indicated  and  it  will  be  necessary,  among  other  things, 
to  determine  the  way  in  which  the  amount  of  prior  experience  in  reading  by 
listening  interacts  with  the  kind  of  training  experience  provided. 
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A  Two  Process  Hypothesis  Regarding  the  Comprehension 
of  Time  Compressed  Speech 

By  now,  enough  researh  has  been  accomplished  to  permit  a  fairly  accurate 
description  of  the  relationship  between  word  rate  and  comprehension  over  a 
fairly  wide  range  of  values  for  the  word  rate  variable.   The  relationship  that 
emerges  is  not  a  linear  one  and  its  explanation  is,  therefore,  apt  to  be  somewhat 
involved.   There  are    two  general  classes  of  results  which,  when  taken  together, 
suggest  that  the  relationship  between  word  rate  and  comprehension  is  structured 
by  more  than  one  underlying  process.   First,  there  are  those  studies  in  which 
comprehens ion  has  been  measured  at  various  word  rates.   Although  no  single 
study  has  included  an  adequate  range  of  values  for  the  word  rate  variable, 
when  these  studies  are  considered  collectively,  the  relationship  that  emerges 
is  one  in  which  comprehension  declines  at  a  slow  rate  as  word  rate  is  increased 
until  a  word  rate  of  approximately  275  wpm  is  reached,  and  at  a  much  faster  rate 
thereafter  (See  Stimulus  Variables,  pg.  12)-   Then,  there  are    the  experiments 
in  which  word  intelligibility  has  been  determined  at  several  compressions  (Char- 
acteristics of  the  Signal,  pg-9  ).   When  the  results  of  these  studies  are 
compared  with  the  results  of  studies  in  which  compehension  has  been  measured 
at  several  word  rates,  there  is  a  strong  suggestion  that  comprehension  declines 
more  rapidly  than  intelligibility  as  the  amount  of  compression  is  increased. 
This  suggestion  has  been  confirmed  by  an  experiment  (Foulke  S-'Sticht,  1 967 ) 
in  which  both  intelligibility  and  comprehension  were  determined  at  several 
compressions.   As  the  amount  of  compression  was  increased,  both  intelligibility 
and  comprehension  decreased,  but  intelligibility  was  always  superior  to  com- 
prehension and  was  affected  much  less  than  comprehension  by  increasing  the 
amount  of  compression. 

The  fact  that  increasing  the  amount  of  compression  has  a  different 
effect  upon  comprehension  than  upon  intelligibility  suggests  that  decreased 
intelligibility  is  not,  in  itself,  an  adequate  explanation  of  the  loss  in 
comprehension.   One  might  expect  decreased  intelligibility  to  i nterf ere  wi th 
comprehension  to  some  extent,  but  the  cues  that  become  available  to  a  listener 
when  he  hears  a  succession  of  meaningful  words  with  high  sequential  dependency, 
as  is  the  case  when  comprehension  is  measured,  should  at  least  partially 
compensate  for  this  interference,   In  any  case,  if  intelligibility  were  the 
only  factor,  comprehension  would  not  decline  at  a  different  rate  than  in- 
telligibility.  The  change  in  the  rate  at  which  comprehension  declines  beyond 
275  wpm  may  mean  that  when  a  certain  critical  word  rate  is  reached,  an 
additional  factor  begins  to  determine  the  loss  in  comprehension.   The  perception 
of  speech  entails  the  registration,  encoding  and  storage  of  speech  information, 
and  these  operations  require  time.   When  the  word  rate  is  too  high,  words 
cannot  be  processed  as  fast  as  they  are    received  with  the  result  that  some  of 
the  words  and  their  associated  meanings  are    lost.   To  put  it   another  way, 
when  channel  capacity  is  exceeded,  some  of  the  input  cannot  be  recovered  at 
the  output  (Miller,  1953  &  1956). 

Some  neurological  support  for  this  view  is  provided  by  the  findings  of 
clinical  audiology,  reviewed  earlier  (See  Intelligibility  and  Anatomical 
Damage,  pg.  11).   When  hearing  losses  are  due  to  peripherial  conductive  disorders, 
the  curve  describing  the  relationship  between  amount  of  compression  and  the 
intensity  required  for  50%  intelligibility  is  elevated,  but  not  changed  in 
shape.   On  the  other  hand,  when  hearing  losses  are   due  to  central  disorders, 
such  as  temporal  lobe  turners  or  lesions,  or  the  more  diffuse  involutional 
changes  that  accompany  presbycuses,  the  threshold  curve  just  mentioned  is 
not  only  elevated,  but  its  shape  is  substantially  altered.   In  some  cases, 
when  compression  is  moderately  high,  there  is  no  intensity  that  will  produce 
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50%  intelligibility  It  may  be  that  these  central  disorde 
of  increasing  the  time  required  to  process  incoming  speech 
the  channel  is  overloaded  at  a  lower  word  rate. 

The  explanation  just  presented  is,  of  course,  quit 
good  deal  of  research  regarding  sentence,  word  and  syllabi 
in  order  to  provide  a  substantial  basis  for  the  hypothesis 
right  direction  was  made  by  D  i  eh  1  e_t_  aj_,  ,  (1959)  who  measu 
of  selections,  the  word  rates  of  which  had  been  varied  by 
between  words.   However,  since  the  words  they  use  had  not 
time,  the  maximum  word  rate  that  could  be  achieved  by  redu 
between  words  was  still  relatively  slow.   Individual  words 
subjected  to  moderately  high  compression  before  the  method 
rate  by  manipulating  the  intervals  between  words  could  be 
the  word  rates  that  would  be  required  to  test  the  hypothes 
This  research  has  not  yet  been  performed. 
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CHAPTER  I  I  I 

The  Intelligibility  and  Comprehension  of 
Time  Compressed  Speech" 


Erne  r s on  Fou 1 ke#* 
Thomas  G .  St  icht 


Abstract 

A  listening  passage  and  a  list  of  phonetically  balanced  (PB)  words 
were  presented  at  five  compressions  in  time:   22%,  36%,  k6%,    53%,  and  59%- 
Compression  was  accomplished  by  a  method  which  avoids  distortions  in  vocal 
pitch  and  quality.   Listening  comprehension  and  word  intelligibility  were 
measured  at  each  of  the  five  time  compressions.   The  results  showed  that, 
although  both  intelligibility  and  comprehension  decreased  as  the  percent  of 
compression  was  increased,  comprehension  declined  much  more  rapidly  than 
intelligibility.   An  interpretation  of  the  results  is  given  in  terms  of 
the  differential  perceptual  and  cognitive  tasks  confronting  the  listener 
in  the  comprehension  and  intelligibility  procedures. 

Time  compressed  speech  is  speech  that  is  reproduced  in  less  time 
than  the  time  required  for  the  original  recording.   A  familiar  method  for 
accomplishing  this  is  the  reproduction  of  a  record  or  tape  at  a  faster 
speed  than  the  one  used  during  recording.   However,  this  method  produces 
distortion  in  vocal  pitch  and  quality  that  interfere  seriously  with  its 
intelligibility. 

Speech  may  also  be  compressed  in  time,  and  without  distortion  in 
vocal  pitch,  by  a  sampling  method  in  which  brief  segments  of  recorded 
speech  are    periodically  discarded  and  the  resulting  gaps  are  closed.   The 
success  of  the  sampling  method  depends  upon  the  fact  that  samples  can  be 
discarded  which  are    so  small  that  the  human  ear    cannot  detect  their  absence. 

Compression  of  this  sort  may  be  accomplished  manually  be  removing 
short  segments  of  a  recorded  tape  and  splicing  the  free  ends  together  again 
(Garvey,  1953).   If,  for  instance,  every  third  centimeter  of  a  recorded 
tape  were  removed  in  this  manner,  the  resulting  tape  would  be  two-thirds 
the  length  of  the  original  tape,  and  only  two-thirds  as  much  time  would  be 
required  for  its  reproduction. 

The  manual  sampling  method  is,  of  course,  too  cumbersome  for  most 
purposes.   Equipment  utilizing  a  method  introduced  by  Fairbanks  (195*0 
accomplishes  a  similar  kind  of  compression  by  electromechanical  means. 

The  superiority  of  the  sampling  method  with  respect  to  the 
intelligibility  of  single  words  has  been  demonstrated  by  Garvey  (1953) • 
He  compared  the  intelligibility  of  words  compressed  in  time  both  by  the  sampling 
method  and  by  increasing  the  playback  speed  of  recorded  tape,  and  found  that 
listeners  could  identify  a  significantly  higher  percentage  of  words  compressed 
in  time  by  the  sampling  method. 


"-''The  research  reported  here  was  performed  as  a  part  of  the  Rapid 
Speech  Project  at  the  University  of  Louisville,  and  with  the  financial 
support  of  the  Office  of  Education,  under  contract  #2430. 

"-'Dr.  Emerson  Fou  1  ke  is  Director  of  the  Center  for  Rate  Controlled 
Recordings  and  is  associate  professor  of  Psychology  in  the  Department  of 
Psychology  at  the  University  of  Louisville,  Louisville,  Kentucky,  *+0208. 

Dr.  Thomas  G.  Sticht  is  a  post  doctoral  Research  Fellow  at  the 
Department  of  Psychology,  University  of  Pittsburgh,  Pittsburgh,  Pennsylvania 
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The  superiority  of  the  sampling  method  cannot  be  demonstrated  so  easily 
when  the  listener's  task  is  changed  from  mere  identification  of  words,  as  in 
the  intelligibility  testing  procedure  to  the  comprehension  of  connected  speech. 
Fou 1 ke  ,  Amster,  Nolan  and  Bixler,  (1962),  found  substantial  losses  in  the 
comprehension  of  listening  selections,  as  indicated  by  performance  on  multiple 
choice  tests,  when  the  selections  were  compressed  enough  to  produce  word  rates 
in  excess  of  275  wpm  (words  per  minute).   Thus,  it  appears  that  compressions 
that  interfere  very  little  with  intelligibility,  interfere  substantially  with 
comprehens  ion . 

In  a  direct  comparison  of  a  listening  selection  compressed  both  by 
the  sampling  method  and  by  increasing  the  playback  speed  of  tape,  McLain, 
(1962)  found  a  slight  but  statistically  significant  difference  in  favor  of  the 
sampling  method  for  a  selection  reproduced  at  325  wpm.   Foulke,  (1966a),  in 
an  experiment  that  presented  a  listening  selection  compressed  by  both  methods, 
and  at  several  accelerated  word  rates,  found  no  differences  in  comprehension 
that  could  be  attributed  to  the  methods  of  compression. 

The  foregoing  evidence,  though  scattered,  suggests  that  although  the 
time  compressed  words  may  be  intelligible  when  they  are  heard  in  isolation, 
the  time  compressed  speech  that  results  when  they  are    heard  in  meaningful 
sequences  may  not  be  comprehensible,   However,  there  has  been  no  single 
experiment  in  which  intelligibility  and  comprehension  have  been  examined  over 
a  wide  range  of  compressions  in  time.   The  issue  at  stake  here  is  an  important 
one  since  a  definitive  answer  to  the  question  has  important  implications  for 
future  research.   To  the  extent  that  the  problem  is  one  of  loss  of  intelligibility 
of  single  words,  attention  will  be  directed  toward  the  improvement  of  the  equipment 
used  for  time  compression.   To  the  extent  that  the  problem  is  the  increased  rate 
at  which  information  is  fed  to  the  central  nervous  system  when  speech  is  compressed 
in  time,  attention  will  be  directed  to  the  analysis  of  the  demands  placed  upon 
the  perceptual  and  cognitive  processing  functions  of  the  listener  by  time  compressed 
speech.   Because  of  these  considerations,  an  experiment  was  performed  in  which 
the  intelligibility  of  single  words  and  the  comprehension  of  connected  speech 
were  measured  at  several  compressions  in  time. 

Method 

Subjects 

One  hundred  University  of  Louisville  students,  of  both  sexes,  served 
as  S_s  in  the  experiment.  All  were  free  from  any  obvious  hearing  defects  and 
none  of  them  had  prior  experience  with  time  compressed  speech. 

Apparatus    and    Materials 

Listening  comprehension  was  measured  with  the  listening  subtest  of 
the  Sequential  Test  of  Educational  Progress,  Form  1A,  Part  1.   Form  1A 
consists  of  brief  listening  selections  of  scientific  and  literary  content  that 
are    appropriate  with  respect  to  interest  and  difficulty  for  a  college  freshman 
population,   For  each  selection,  there  are  a  few  multiple  choice  questions  covering 
facts  and  implications  of  the  selection.   Part  1  contains  five  such  selections 
and  a  total  of  thirty  six  questions.   Due  to  an  i nadvertance ,  question  17  was 
omitted,  so  that  the  highest  possible  test  score  in  the  present  study  was  35- 

The  five  listening  selections  were  read  in  a  recording  studio  at  the 
American  Printing  House  for  the  Blind  by  a  professional  reader  employed  in  the 
Talking  Book  Program,  and  were  recorded  on  magnetic  tape  by  an  Ampex  Tape 
Recorder,  Model  300.   This  tape  was  then  compressed  in  time  by  means  of  the 
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Tempo  Regulator,  a  device  that  accomplishes  compression  by  Fairbank's 
sampling  method  discussed  earlier." 

The  master  tape,  recorded  at  a  word  rate  of  175  wpm,  was  reproduced 
on  the  Tempo  Regulator  at  those  compressions  required  to  produce  word  rates 
of  225,  275,  325,  375,  and  425  wpm.   The  output  of  the  Tempo  Regulator  was 
recorded  on  magnetic  tape  and  this  tape  was  reproduced,  during  the  experiment, 
on  a  Wollensak  Tape  Recorder,  Model  T1500.   The  output  of  the  tape  recorder 
was  distributed  to  the  S_s  through  headsets  fitted  with  ear  cushions,  and 
the  signal  level  at  each  headset  could  be  adjusted  by  the  _S  for  comfortable 
1  i  s ten i  ng . 

The  100  words  comprising  a  phonetically  balanced  word  list  (Egan, 
1 9^48)  were  read  by  the  same  reader,  prepared  in  the  same  manner,  and  compressed 
on  the  Tempo  Regulator  by  the  same  percentages  as  the  listening  selections. 
As  before,  the  output  of  the  Tempo  Regulator  was  recorded  on  tape  and  this 
tape  was  used  in  the  experiment. 

Finally,  a  brief  "warm  up"  listening  selection  was  prepared  at  each 
of  the  compressions  represented  in  the  experiment.   This  selection  was  used 
to  promote  a  common  listening  set  by  providing  S_s  with  brief  experience  in 
listening  to  time  compressed  speech  before  participating  in  the  experiment. 

Procedure 

The  100  S_s  were  distributed  among  five,  twenty  member  groups.   Each 
group  heard  material  reproduced  at  one  of  the  compressions  used  in  the 
experiment.   All  of  the  members  in  each  group  listened  to  the  "warm  up" 
passage  first.   Then,  each  group  was  further  divided  into  two  sub-groups. 
The  members  of  one  sub-group  heard  and  were  tested  on  the  listening 
selections  first  and  then  identified,  in  writing,  the  phonetically  balanced 
words,  which  were  presented  one  at  a  time  with  a  five  second  interval 
between  words.   This  order  was  reversed  for  the  other  sub-group,  to  control 
for  the  possibility  of  an  effect  due  to  order.   The  same  S_s  were  used  for 
the  measurement  of  intelligibility  and  of  comprehension  in  order  to  surpress 
effects  due  to  individual  differences. 

Subjects  were  tested  as  they  became  available.   Therefore,  although 
several  S_s  were  usually  tested  at  a  time,  occasionally  only  one  S  was  present 
at  a  testing  session.   Tests  were  conducted  at  a  given  compression  until  the 
twenty  S_s  required  for  an  experimental  group  had  been  tested.   This  procedure 
was  followed  for  the  five  experimental  groups. 

Resul ts 

An  intelligibility  score,  the  percent  of  correctly  identified  PB 
words,  and  a  comprehension  score,  the  percent  of  correctly  answered  multiple 
choice  items,  were  determined  for  each  S_.   The  means  and  standard  deviations 
of  these  scores  at  each  of  the  five  time  compressions  represented  in  the 
experiment  are    shown  in  Table  1.   The  effect  of  time  compression  on  intelligibility 
and  comprehension  is  also  shown  in  Figure  1.   In  this  figure,  the  five  time 
compressions  employed  in  the  experiment  are   displayed  along  the  x-axis.   The 


-'For  further  information  about  speech  compression  equipment,  consult 
Gotham  Audio  Corporation,  2  West  46th  Street,  N.  Y.,  N.  Y.,  10036.   Readers 
interested  in  obtaining  time  compressed  tapes  for  research  or  demonstration 
may  write  to  Dr.  Emerson  Fou 1 ke ,  Director,  Center  for  Rate  Controlled 
Recordings,  University  of  Louisville,  Louisville,  Kentucky,  40208. 
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Figure  I.   Intelligibility  and  comprehension  as  a  function  of  percent  of 
compress  ion . 
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entry  below  each  compression  value  refers  to  the  word  rate  that  would  result 
if  connected  discourse  at  a  normal  word  rate  of  175  wpm  (Johnson,  Darley,  & 
Spr iestersbach,  1963,  pgs .  202-203),  were  compressed  by  that  amount.   Percent 
correct  for  the  two  dependent  variables  is  scaled  on  the  y-axis.   As  the 
amount  of  compression  was  increased,  both  intelligibility  and  comprehension 
decreased.   However,  comparison  of  the  two  curves  indicates  that  intelligibility 
was  always  superior  to  comprehension  and  that  intelligibility  was  affected  much 
less  than  comprehension  by  increasing  the  amount  of  compression. 

The  data  upon  which  Figure  1  is  based  were  examined  by  an  analysis  of 
variance.   The  results  of  this  analysis,  presented  in  Table  2,  confirm  the 
impressions  conveyed  by  Figure  1.   Changes  in  intelligibility  and  in  comprehension, 
as  well  as  the  interaction  of  these  variables,  were  significant  (p<\001  in 
a  1 1  cases) . 

TABLE  1 

Changes  in  Intelligibility  and  Comprehension  as  a 
Function  of  Percent  of  Compression  in  Time 


Percent  of  Compression 

1  nte 1  1  i  q 

ibi 1  i  ty 

Comprehens 

ion     1 

Mean 

S.D. 

Mean 

S.D. 

22% 

93% 

2.2 

73% 

12.4 

36% 

91% 

3.0 

66% 

14.7 

46% 

89% 

3.2 

67% 

13.0 

53% 

85% 

5.0 

56% 

12.0 

59% 

84% 

3.7     J 

53% 

14.0 

TABLE  11 

The  Analysis  of  Variance  of  Intelligibility  Scores 
and  Comprehension  Scores 


Source 

df 

Mean  Square 

F 

P 

Between  S_s 

99 

Percent  of  Compression 

4 

1,449 

15 

<.001 

Error  (b) 

95 

99 

Within  Ss 

100 

Inte 1 1 i  g  i  b  i 1  i  ty 

vs  . 

Comprehens  ion 

1 

32,462 

877 

<.001 

1  nteract  ion 

4 

891 

6 

<.001 

Error  (w) 

95 

37 

Total 

199 
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D  i  scuss  i  on 

With  respect  to  intelligibility,  the  results  of  the  present 
study  are  in  good  agreement  with  those  of  Garvey  (1953)-   We  found  only 
a  nine  percent  loss  in  the  intelligibility  of  PB  words  compressed  by  an 
amount  sufficient  to  produce  a  word  rate  of  425  wpm  with  connected  speech, 
assuming  an  original  or  uncompressed  word  rate  of  175  wpm.   At  the  compression 
that  would  be  required  to  accelerate  speech  to  approximately  twice  the 
normal  word  rate,  we  found  only  a  six  percent  loss  in  the  intelligibility 
of  PB  words.   At  a  similar  compression  accomplished  by  the  alternative 
method  of  reproducing  a  tape  at  a  faster  speed  than  the  one  used  during 
recording   Klumpp  S-  Webster  (1961)  reported  a  sixty  percent  loss  in 
intelligibility.   Garvey  also  found  intelligibility  losses  of  this 
magnitude  when  compression  was  accomplished  by  increasing  the  playback 
speed  of  tape.   Thus,  we  conclude  with  Garvey  that  the  intelligibility  of 
single  words  is  affected  much  less  by  the  sampling  method  than  by  the 
speeded  playback  of  a  tape  or  record.   The  superiority  of  the  sampling 
method  in  this  respect  is  probably  explained  adequately  by  its  freedom 
from  distortion  in  vocal  pitch  and  quality. 

It  was,  of  course,  expected  that  comprehension  scores  would  be 
lower  than  intelligibility  scores.   The  demonstration  of  comprehension 
imposes  a  much  more  complex  task  on  the  listener  than  does  the  demonstration 
of  intelligibility.   The  behavior  upon  which  the  measurement  of  intelligibility 
depends,  implies  registration  of  the  stimulus  word,  some  kind  of  short 
term  memory  storage,  and  the  transduction  of  the  stored  item  to  an 
overt  response.   On  the  other  hand,  the  behavior  on  which  the  measurement 
of  comprehension  is  based,  implies  continuous  registration  and  short 
term  memory  storage  of  stimulus  material,  the  continuous  encoding,  or 
simplification  by  reorganization  and  selective  discarding  of  stimulus 
information  so  that  it  can  be  transferred  to  long  term  memory  storage,  and 
a  final  encoding  step  required  for  the  transduction  of  material  in  long 
term  storage  to  overt  behavior. 

It  is  the  finding  that  the  difference  between  intelligibility 
and  comprehension  scores  increases  as  the  amount  of  compression  is  increased 
that  requires  additional  explanation. 

One  possibility  is  that  the  progressively  larger  loss  in  comprehension 
is  a  consequence  of  the  cumulative  effects  of  the  relatively  smaller  losses 
in  intelligibility.   However,  it  is  well  known  that  it  is  not  necessary 
for  all  of  the  units  of  a  message  to  be  intelligible  in  order  for  the 
message  to  be  received  accurately   (Miller  &■  Selfridge,  1950;  Atneave, 
195^) •   Because  of  prior  learning,  the  listener  is  able  to  reconstruct  a 
sent  message  on  the  basis  of  reduced  cues.   He  makes  use  of  sequential 
probabilities  in  gramatical  speech  and  the  mean i ngf u 1 ness  of  the  heard 
message  in  supplying  missed  words.   A  more  convincing  explanation  may  be 
that  when  continuous  speech  is  compressed   the  number  of  words  per  unit 
time  is  increased   and  the  intervals  between  words  are  decreased.   It 
has  been  shown  repeatedly  in  studies  of  verbal  learning  that  (Miller,  1951 > 
pg.  212;  Osgood,  1953,  pg.  505),  that  the  difficulty  of  a  learning  task 
is  increased  by  increasing  the  number  of  items  in  the  list  to  be  learned 
and  by  decreasing  the  i n ters t imu 1  us  interval.   To  the  extent  that  these 
two  situations  are  similar,  an  increase  in  time  compression  may  mean  an 
increased  contribution  of  factors  related  to  task  difficulty.   Such 
factors  would  not  apply  to  the  measurement  of  intelligibility,  as  defined 
in  this  study,  since  its  measurement  required  the  presentation  of  single 
words  in  isolation,  rather  than  connected  sequences  of  words. 
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The  results  of  the  present  study  suggest  the  relevance  of  a 
concept  such  as  channel  capacity   (Miller,  1953;  1956).   According  to 
this  concept,  a  communication  channel,  in  this  case  the  listener,  has  a 
finite  capacity  for  handling  information.   As  the  amount  of  information 
applied  to  the  input  of  the  channel  is  increased,  there  is  a  corresponding 
increase  in  the  amount  of  information  transmitted  by  the  channel,  until 
channel  capacity  is  reached.   Further  increases  in  the  amount  of  input 
information  cannot  be  handled  by  the  channel,  with  the  result  that  some 
information  is  lost.   Assuming  normal  speech  to  occur  at  a  rate  that  is 
well  below  channel  capacity,  increasing  word  rate  should  have  little 
effect  upon  comprehension  initially.   However,  as  the  word  rate  begins 
to  approach  channel  capacity,  there  should  be  an  acceleration  of  loss 
in  comprehension  and,  when  channel  capacity  is  exceeded,  comprehension 
should  fall  off  vary  rapidly.   The  comprehension  curve  in  Figure  1 
resembles  a  positively  accelerated  decreasing  function,  although  not  enough 
values  for  the  word  rate  variable  were  determined  to  test  this  suggestion. 
However,  the  results  of  other  studies  (Foulke,  et  al.,  1962,  1964)  have 
also  suggested  that  comprehension  is  a  positively  accelerated  decreasing 
function  of  word  rate. 

Silent  visual  reading  rates  considerably  in  excess  of  275  wpm, 
the  word  rate  at  which  listening  comprehension  generally  begins  to 
decline  rapidly,  are   commonplace.   However,  because  of  the  spatial  display 
of  information  on  the  printed  page,  the  reader  is  able  to  perform  the 
perceptual  operation  referred  to  by  Miller  (1956),  as  "chunking".   In 
order  to  keep  the  rate  of  information  input  below  his  channel  capacity, 
the  fast  visual  reader  reduces  the  number  of  elements  with  which  he  must 
contend  by  combining  the  elements  given  by  the  structure  of  language 
into  larger  elements.   He  begins  to  perceive  not  just  single  words,  but 
entire  phrases  or  sentences.   Because  of  the  temporal  display  of  information 
presented  aurally,  the  listener  cannot  perform  this  operation. 

The  data  required  to  test  the  explanation  offered  here  are  not 
yet  available.   One  clear  task  for  future  research  is  a  more  careful 
determination  of  the  relationship  between  word  rate  and  comprehension. 
If,  after  further  investigation,  the  attempt  to  determine  the  differential 
effect  of  increasing  word  rate  on  intelligibility  and  comprehension  of 
compressed  speech  is  convincing   it  will  have  important  practical  implications 
If  the  inability  to  show  good  comprehension  of  very  rapid  speech  is  found 
to  be  a  consequence  of  a  verbal  input  that  has  been  rendered  incompatible 
with  the  human  perceptual  mechanism  because  channel  capacity  has  been 
exceeded,  current  efforts  to  train  for  comprehension  of  very  rapid  speech 
cannot  be  expected  to  have  much  effect.   This  conclusion  is  not  contradicted 
by  past  efforts  at  tr'aining.   Such  efforts  have  not,  in  the  main,  been 
successful  (Foulke,  1964,  Voor  &   Miller,  1965).   However,  the  task  of 
defining  an  adequate  training  experience  has  only  begun,  and  further 
efforts  along  this  line  are  now  in  progress  (Orr,  Friedman,  £•  Williams, 
1965). 

If,  on  the  other  hand,  loss  in  comprehension  turns  out  to  be 
primarily  a  consequence  of  words  that  are  less  intelligible  because  of 
the  deterioration  of  signal  quality  that  is  inherent  in  the  time  compression 
of  speech  by  the  sampling  method,  other  directions  for  research  are    indicated. 
For  instance,  one  might  consider  further  engineering  refinements  of  the 
equipment  used  for  the  time  compression  of  speech,  with  a  view  to  improving 
signal  quality.   One  might  also  consider  a  training  program  designed  to 
promote  the  comprehension  of  highly  compressed  continuous  speech  by  teaching 
listeners  to  discriminate  and  identify  words  and  phrases  that  are  rendered 
unfamiliar  by  virtue  of  having  been  greatly  compressed  in  time. 
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In  any  case,  one  conclusion,  supported  by  results  present  and  past, 
is  that  speech  may  be  compressed  enough  to  result  in  a  significant  savings 
in  time  without  interferring  seriously  with  the  comprehension  that  a 
motivated  listener  is  able  to  show.   Many  blind  students  should,  for 
instance,  be  able  to  make  effective  use  of  material  compressed  to  250 
or  275  wpm.   This  amount  of  compression,  though  moderate,  results  in  a 
large  enough  savings  in  time  to  be  valuable  to  a  busy  student  and  we  have 
shown  repeatedly  that  selections  prepared  at  word  rates  in  this  neighborhood 
can  be  comprehended  easily.   Beyond  approximately  275  wpm,  the  loss  in 
comprehension  begins  to  become  serious  and  more  research  will  clearly  be 
required  before  it  is  possible  to  make  practical  use  of  very  rapid  speech. 


CHAPTER  IV 
Computers  For  Speech  Time  Compression 

Robert  J.  Scott* 


I  would  like  first  to  review  ways  in  which  speech  can  be  time-compressed 
and  then  to  discuss  ways  in  which  computers  can  aid  us  in  this  task.   To  compress 
speech  we  can  manipulate  the  complex  waveform  of  speech  directly  or  transform 
the  speech  from  the  time  domain  to  the  frequency  domain.   The  simplest  time 
domain  method  is  to  replay  a  record  or  tape  of  speech  at  a  speed  faster  than 
that  used  for  recording.   The  frequency  distortion  or  shift  is  proportional 
to  the  increase  in  playback  speed.   This  method  even  with  its  spectral  dis- 
tortion is  used  by  many  blind  listeners.   In  all  the  methods  shown  we  are 
making  compromises  of  one  sort  or  another  to  produce  accelerated  speech. 
Only  because  of  the  high  temporal  redundancy  in  the  speech  signal  are   we 
able  to  perform  such  operations  on  the  speech  and  still  retain  an  acceptable 
level  of  intelligibility. 

As  can  be  seen  in  Figure  1  we  have  two  avenues  of  attack.   One  is 
to  manipulate  the  speech  signal  in  the  time  domain  which  is  difficult  because 
of  the  inverse  relationship  between  time  and  frequency.   By  far,  the  most 
popular  method  is  that  of  periodically  discarding  speech  segments  as  is 
done  in  the  Fairbanks  method  of  time  compression  using  the  Tempo-Regulator. 
The  digital  simulation  will  be  discussed  later.   Another  method  selectively 
discards  voicing  periods  or  other  speech  events.   The  last  method  in  the 
time  domain  shortens  the  duration  of  pauses.   The  other  avenue  of  attack 
relieves  us  of  the  troublesome  time-frequency  relationship  that  we  have 
in  dealing  with  the  complex  speech  waveform.   By  making  an  approximate 
Fourier  transformation  of  the  speech  signal  we  do  away  with  the  phase  relation- 
ships between  components  of  the  speech  signal  and  relate  spectral  power 
to  time.   This  is  done  using  a  channel  vocoder  or  filter  bank.   Once  the 
speech  is  so  transformed  and  recorded,  a  time  compression  followed  by  another 
transformation  back  to  the  time  domain  produces  accelerated  speech.   The 
process  is  analogous  to  making  a  sonogram  of  speech  on  a  stretched  sheet  of 
rubber  and  then  allowing  the  sheet  to  relax  in  the  time  dimension  to  produce 
a  time  compression.   The  formant  vocoder  extracts  the  formants  of  the 
speech  signal.   Here  we  relate  formant  positions  in  the  spectrum  to  time 
and  can  make  a  time  compression  of  this  representation  before  synthesis  to 
produce  accelerated  speech.   There  are  variations  in  the  vocoder  techniques 
but  all  involve  time  to  frequency  transformations  before  compression  and 
produce  varying  effects  of  distortion  on  the  final  accelerated  speech. 

Just  as  the  Tempo-Regulator  provided  us  with  an  automatic  method 
of  time-compression  far  superior  to  Garvey's  early  hand-patching  attempts, 
the  computer  offers  us  an  opportunity  to  try  more  sophisticated  time- 
compression  techniques.   No  one  would,  for  example,  dispute  the  fact  that 
every  speech  segment  does  not  contribute  the  same  amount  of  information 
to  the  overall  speech  sample  yet  in  the  Tempo-Regulator  we  discard  segments 
periodically  independent  of  the  information  in  the  segments  being  discarded. 
The  computer  allows  us  to  develop  algorithms  for  selectively  discarding 
speech  segments  based  on  their  contributions  to  the  overall  intelligibility. 
This  can-be  done  both  in  the  time  domain  and  the  frequency  domain. 

Figure  2  diagrams  a  hybrid  computer  system  that  has  been  used  effectively 
for  producing  time-adjusted  speech.   Here  we  have  the  ability  to  compress 
speech  both  in  the  time  domain  through  direct  analog  to  digital  conversion 
of  the  speech  waveform  as  well  as  in  the  frequency  domain  by  first  passing 
the  speech  through  a  vocoder  analyzer.   The  digitized  speech  signals  may  be 


"Dr.  Robert  J.  Scott  is  with  the  Range  Measurement  Lab.,  Patrick 
Air  Force  Base,  Florida. 
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METHODS  FOR  SPEECH  TIME  COMPRESSION 

A.  Manipulating  the  Speech  Complex  Waveform 

I  .  Speed-up  of  recorded  speech 

2.  Tempo-Regulator 

3.  Digital  simulation  of  Fairbanks'  method 
k.  Pitch  period  removal 

5.   Pause  shortening 

B.  Time  to  Frequency  Transformation 

1.  Filter  bank  or  channel  vocoder 
(Linear  or  nonlinear  compression) 

2.  Formant  vocoder 

(Linear  or  nonlinear  compression) 


Figure  I.   Methods  for  Speech  Time  Compression 
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stored  on  digital  magnetic  tape  for  later  processing  or  modified  by  the  digital 
computer  and  reconstructed  into  an  analog  speech  signal. 

There  is  concern  over  the  effect  of  changes  in  pause  duration  on  speech 
intelligibility.   The  computer  can  be  used  effectively  to  detect  and  time- 
adjust  pauses  in  speech.   The  speech  to  be  altered  is  first  digitized  in 
the  computer.   Then  the  computer  program  using  carefully  selected  thresholds 
scans  the  digital  recording  with  a  150  millisecond  "window"  searching  for 
pauses  of  at  least  this  duration.   Once  a  pause  is  detected  it  can  be 
lengthened,  shortened,  or  completely  eliminated  under  program  control. 
It  may  be  necessary  when  considering  the  fatigue  factor  of  prolonged 
exposure  to  accelerated  speech  to  lengthen  certain  pauses  rather  than 
shorten  them  and  the  computer  can  handle  this  task  equally  well. 

Time  does  not  permit  exploring  some  of  the  nonlinear  ways  in  which 
we  can  time  compress  speech  using  vocoders  but  in  our  research  we  have 
examined  a  compression  technique  based  on  a  constant  rate  of  change  of 
spectral  energy  which  might  be  useful  when  used  in  conjunction  with  other 
compression  techniques.   As  many  researchers  have  already  discovered  certain 
speech  events  retain  their  identity  under  time  compression  better  than  others. 
Our  research  in  this  area  suggests  that  if  a  nonlinear  method  of  time- 
compression  is  to  work,  it  must  time-adjust  only  portions  of  the  speech 
where  such  time-adjustments  can  be  tolerated  without  introducing  phonemic 
confusion.   This  confusion  may  occur  in  speech  wherever  duration  is  a 
cue  to  phonemic  discrimination. 

Simulation  of  the  Fairbanks  device  was  attempted  on  the  digital 
computer  and  was  achieved  quite  simply.   The  speech  waveform  was  first 
digitized,  blocked,  and  stored  on  digital  magnetic  tape  sampled  at  20,000 
times  per  second  using  9-bit  samples.   The  digitized  tape  was  then  read 
into  the  PDP-1  computer,  appropriate  segments  of  the  waveform  discarded 
for  time  compression,  and  then  processed  through  the  digital  to  analog 
converter  at  the  same  sampling  rate.   Within  the  limits  of  the  digitized 
signal,  arbitrary  sample  and  discard  intervals  can  be  selected.   Improvement 
in  the  quality  of  the  time-adjusted  speech  was  made  through  digital  smoothing 
of  the  discontinuities  at  the  segment  boundaries.   With  the  computer  we  can 
determine  the  amount  of  information  lost  in  using  large  discard  intervals 
for  high  degrees  of  compression.   For  example  a  passage  can  be  accelerated 
to  525  words  per  minute  using  a  20  millisecond  sample  interval  and  a  kO 
millisecond  discard  interval.   Then,  by  playing  each  20  millisecond  segment 
in  the  compressed  speech  three  times  we  can  restore  the  original  time 
frame  and  check  on  intelligibility  loss. 

The  Fairbanks  method  reduces  the  intelligibility  of  the  speech  by 
discarding  arbitrary  portions  of  the  speech  signal  which  may  be  essential 
in  retaining  intelligibility  in  the  compressed  speech.   I  would  like  to 
suggest  a  method  of  dichotic  listening  which  presents  the  compressed  speech 
to  one  ear   and  portions  of  the  discarded  speech  to  the  other  ear.   By  dichotic 
listening  we  mean  presenting  unique  signals  to  each  ear.   In  order  to  present 
the  speech  d i chot ica 1 1 y ,  the  compressed  speech  is  presented  to  one  ear  and 
portions  of  the  discarded  intervals  are  joined  sequentially  and  presented  to 
the  other  ear. 

The  example  chosen  for  illustration  in  Figure  3  involves  a  two-to- 
one  time  compression.   For  equal  sample  and  discard  intervals  there  is  no 
information  lost.   The  odd-numbered  segments  of  the  speech  signal  are 
directed  to  one  ear   while  the  even-numbered  segments  are  directed  to  the 
other  ear.   The  segments  at  one  ear  are  staggered  in  time  with  respect  to 
the  other  ear    to  keep  segment  discontinuities  from  occurring  at  both  ears 
s  imu 1 taneous ly . 
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Figure  3-   Segment  Relationships  in  Dichotic  Compression 
Using  50  Per  Cent  Compression. 
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It  seems  that  the  intelligibility  of  compressed  speech  would  be  improved 
by  using  both  ears  and  making  as  much  of  the  signal  as  possible  available  to 
the  listener,  even  though  the  signals  presented  to  the  two  ears  are    not  in 
their  proper  time  relationship.   The  results  of  a  psychoacous t i c  test  which 
I  wi 1 1  describe  in  a  moment  suggest  that  fusion  of  the  two  distinct  signals 
takes  place  in  the  central  nervous  system  and  apparently  restores  the  temporal 
order  of  the  signals  during  the  process  of  identification.   The  dichotic 
time  compression  was  performed  using  a  digital  computer.   The  tape-recorded 
speech  was  first  converted  to  digital  speech  samples  and  stored  on  magnetic 
tape  using  the  digital  computer  as  a  buffer.   A  sampling  rate  was  20,000 
per  second  using  9-bit  samples.   The  appropriate  samples  were  then  taken 
from  tape  under  computer  control  and  again  stored  on  digital  tape  in  the 
proper  format  for  dichotic  compressed  speech.   The  samples  to  the  two 
ears  were  stored  alternately  for  replay  through  two  channels  of  a  digital- 
to-analog  converter.   Some  digital  smoothing  was  done  at  the  segment  boundaries. 

The  dichotic  compression  could  be  done  mechanically  as  shown  in 
Figure  k.      Here  we  have  two  quadripole  head  assemblies  mounted  on  the  same 
shaft  but  offset  with  respect  to  each  other  by  k5   degrees.   Each  head  is 
similar  in  design  to  Fairbanks'  device  with  the  tape  passing  over  the  assembly 
making  a  total  change  in  direction  of  90  degrees  and  in  contact  with  one- 
fourth  of  the  surface  of  the  rotating  head  assembly.   The  direction  of 
the  rotating  heads  relative  to  the  tape  direction  determines  whether  compression 
or  expansion  of  the  speech  occurs  and  the  relative  speeds  determine  the  amount 
of  compression  or  expansion.   For  the  dichotic  compression  a  two  tract  tape 
would  be  used  with  the  same  speech  signal  recorded  on  each  track.   Each  of  the 
two  heads  would  contact  its  own  track  on  the  tape. 

A  preference  test  was  conducted  to  compare  the  intelligibility  of 
dichotic  time  compression  and  Fairbanks'  method  of  time  compression.   The 
procedure  in  which  a  listener  can  select  alternately  one  of  the  two  modes 
of  compression  while  hearing  continuous  time-compressed  speech  is  illustrated 
in  Figure  5.   The  speech  was  first  time  compressed  dichotically  using  the 
hybrid  computer  and  then  recorded  onto  two  tracks  of  an  audio  tape.   In 
one  mode  of  the  test,  the  listener  heard  a  single  track  at  both  ears  through 
a  binaural  headset.   This  is  the  Fairbanks'  method.   In  the  other  mode,  the 
listener  heard  a  different  track  at  each  ear.   This  is  the  dichotic  method. 
The  listener  selectively  switched  from  one  mode  to  the  other  until  he  decided 
which  was  the  more  intelligible.   No  a  priori  information  concerning  the  signal 
structures  for  the  two  modes  was  given  to  the  test  subjects. 

The  listeners  selected  for  the  comparison  test  report  no  gross 
hearing  impairments  and  most  of  them  had  not  heard  compressed  speech  before. 
The  test  was  conducted  on  ]k   subjects.   Eleven  of  the  fourteen  selected  the 
dichotically  compressed  speech  as  the  mode  which  they  considered  the  more 
i  nte 1  1  i  g  i  b 1 e . 

If  we  accept  the  validity  of  the  observer  judgements  of  the  relative 
intelligibility  of  the  two  modes  of  compressed  speech,  then  the  results 
of  this  test  indicate  that  the  central  nervous  system  is  capable  of  making 
use  of  additional  information  presented  to  the  alternate  ear   of  a  listener 
if  the  absence  of  such  information  severely  restricts  the  speech  intelligibility 

I  hope  that  I  have  aroused  interest  in  those  of  you  who  have  used 
only  the  Tempo-Regulator  for  speech  compression.   There  are    problems,  however, 
in  using  computers  for  speech  time  compression.   The  most  important  are    the 
lack  of  suitable  computer  facilities  for  most  researchers  and  the  high  cost 
of  using  such  computers.   The  computer  cannot  perform  any  tasks  of  speech 
compression  which  have  not  already  been  solved  in  the  programmer's  mind.   The 
computer  does,  however,  release  the  scientist  from  the  restraints  otherwise 
imposed  and  opens  new  avenues  of  attack  in  the  speech  compression  area. 
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Figure  5-  Relative  Phases  of  the  Speech 
Segments  at  the  Two  Ears  for  50  and  66.7  Per  Cent 
Dichotic  Compression. 


CHAPTER  V 

Report  On  Studies  In  Speech  Compression 
Conducted  In  The  Spring  of  1966 

By 
The  Hadley  School  For  The  Blind 


The  Place  Of  Speech  Compression  In  Academic  Study 


Mrs.  Rose  Diamond" 
Richard  Kinney 

The  purpose  of  this  study  in  Speech  Compression,  assigned  to  the 
Hadley  School  by  Mr.  Robert  Bray,  Chief,  Division  for  the  Blind,  Library 
of  Congress,  at  the  Conference  on  Time  Compression  in  Speech  called  by  him 
at  the  Library  in  December,  1965,  was  to  test  acceptance  and  effectiveness 
of  moderate  compression  (250  words  per  minute)  in  the  study  of  an  academic 
subject.   Participating  in  the  study  were  63  volunteer  students,  most  of 
them  students  of  the  Hadley  School. 

Two  chapters  of  a  course  in  United  States  Citizenship  were  selected 
for  the  test,  and  recorded  on  tape.   A  compressed  version  of  the  narration 
was  also  made:   The  original  and  the  compressed  version  were  each  transferred 
to  12"  records.   Each  lesson  consisted  of  first  a  chapter  of  informational 
text  on  the  subject  being  studied  and  secondly  a  set  of  questions  to  be 
answered.   Students  were  supplied  with  8^  x  1  1  cardboard  test  forms,  with 
ten  rows  of  five  holes  each,  to  mark  their  choices  of  multiple-choice  questions 
on  their  papers.   Almost  all  students  took  the  "course"  in  their  own  homes, 
and  submitted  their  reports  to  the  Hadley  School  in  the  same  way  they  would 
the  lessons  of  a  Hadley  correspondence  course. 

Students  were  invited  to  send  along  with  their  test  reports,  reactions 
with  regard  to  compressed  speech. These  have  proved  interesting  and  enlightening, 
Some  of  the  highlights  of  students'  remarks  are  quoted  in  the  last  column  of 
the  group  reports.   Several  letters  of  special  interest  have  been  reproduced 
in  ful 1 . 

The  particular  value  of  this  study  is  felt  to  be  in  the  longer  exposure 
to  speech  compression  which  the  tests  have  provided.   Most  previous  tests 
involving  compressed  speech  have  been  based  on  samplings  of  only  several 
minutes  in  length,  whereas  the  two  lessons  in  United  States  Citizenship,  in 
the  original  version,  cover  more  than  an  hour  of  listening.   These  tests 
would  seem  to  provide  the  basis  for  a  valid  judgement  as  to  whether  moderate 
degrees  of  compression  could  be  safely  applied  to  recorded  courses  of  study-- 
in  the  case  of  Hadley  School  ■  spec i f ica 1 1 y  to  correspondence  courses  offered 
on  tape  or  discs.   By  "moderate  degrees"  we  mean  compression  anywhere  above 
normal  reading  speed  up  to  250  words  a  minute--the  degree  of  compression  used 
in  the  present  tests. 

We  have  tried  to  present  the  results  in  as  complete  a  way  as  possible- 
first  through  a  listing  of  the  data  regarding  each  participant,  including  the 
grade  on  each  of  the  two  lessons,  and  any  significant  comments  made  by  the 
student;  next  through  graphs  showing  the  achievement  curve  of  the  participants 
in  each  group;  and  lastly  through  a  simple  percentage  graph  showing  the  average 
grade  of  each  group. 


"The  Project  reported  in  this  paper  was  directed  by  Mrs.  Rose  Diamond, 
Project  Director  for  the  Hadley  School  for  the  Blind,  700  Elm  Street,  Winnetka, 
Illinois  60093-   The  project  report  was  read  at  the  Louisville  Conference  by 
Dr.  Richard  Kinney,  also  of  the  Hadley  School  for  the  Blind. 
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It  was  our  hope  that  the  results  might  show  that  a  course  of  study 
presented  with  moderate  compression  is  at  least  as  easy  to  study  as  a  course 
presented  at  normal  reading  speed.   We  are  pleased  to  find  that  the  average 
grade  of  the  compressed  speech  group-Group  A- i s  a  little  higher  than  that  of 
the  Control  Group.   The  comments  and  letters  of  students  in  Group  A  confirm 
the  fact  that  good  students  like  to  listen  to  their  lesson  material  read  at 
faster  than  normal  reading  speed.   As  for  the  addition  of  rapid  playback  to 
speech  compression,  the  results  show  that  students  certainly  cannot  be  re- 
quired to  be  involved  with  this  type  of  listening.   As  the  low  average  grade 
of  the  group  (55)  shows,  a  serious  problem  in  comprehension  exists  for  the 
average  student.   Students  who  have  trained  themselves  to  listen  to  fast 
playback,  however,  apparently  can  clearly  comprehend  recorded  material  that 
sounds  like  squirrel  chatter  to  the  untrained.   The  excellent  work  of  several 
students  in  this  group  is  impressive. 

The  test  imposed  a  necessary  handicap  on  Group  B  through  fixed  rates 
of  playback.   Without  these  fixed  rates,  however,  there  would  have  been  no 
way  of  knowing  what  playback  speeds  were  used.   Side  one  on  the  record  was 
processed  at  33  1/3  R.P.M.,  and  was  to  be  played  back  at  k5    R.P.M.,  giving 
an  increase  on  speed  of  about  Jk%.      Side  two  was  processed  at  16  2/3  R.P.M. 
and  was  to  be  played  back  at  2k   R . P.M.--g i v i ng  an  increase  in  speed  of  about 
50%.   At  least  one  student  was  able  to  comprehend  side  two  played  back  at  33 
1/3  R.P.M. --an  increase  of  100%  or  500  words  a  minute! 

The  remarks  of  students,  and  particularly  the  letters  which  we  have 
included  in  this  report,  are    the  most  interesting  and  perhaps  the  most 
valuable  part  of  the  project.   Most  of  the  comments  of  the  students  in  Group 
A  are  favorable.   The  one  letter  which  we  are  including  in  full  is  particularly 
interesting.   The  student  does  think  she  might  have  done  better  if  she  could 
have  had  more  time- -but  considering  the  fact  that  she  received  a  grade  of 
90  on  each  test,  we  wonder  how  much  better  she  could  have  done! 

Several  students  commented  that  the  questions  were  read  too  fast. 
There  is  certainly  no  point  in  imposing  this  handicap  on  students  of  regular 
courses.   Our  plan,  and  our  recommendation,  is  to  present  questions  to  be 
answered  not  only  at  normal  reading  speed,  but  with  enough  space  between 
questions  so  that  the  student  can  stop  the  record  after  a  question,  answer 
the  question,  and  then  start  the  record  again,  without  losing  any  of  the 
following  question.   The  fact  that  the  students  made  such  a  good  showing 
with  this  added  handicap  is  an  extra  reinforcement  to  the  results  achieved. 

The  letters  from  students  of  Group  B--the  least  likely  group  to 
produce  any  results--are  especially  interesting.   We  call  attention  to  the  first 
letter  from  this  group  from  student  #33,  who  achieved  the  best  score  in  Group 
B  and  who  has  provided  some  valuable  criticism  and  comment  on  the  project 
and  on  speech  compression  in  general.   Most  students  seem  to  have  had  quite 
a  struggle  with  the  extremely  fast  reading  speeds  produced  by  rapid  playback 
superimposed  on  the  compressed  recording;  but  the  fact  that  so  many  were  able 
to  understand  what  they  heard  and  to  come  up  with  a  good  score  is  impressive. 
From  the  participants  of  Group  B,  who  had  the  most  difficult  assignment  of 
all,  we  have  probably  learned  more  than  from  any  of  the  others  who  had  easier 
ass  i  gnments . 

We  are  grateful  to  Robert  Bray  and  the  Division  for  the  Blind  of  the 
Library  of  Congress  for  their  cooperat ion--part icu lar 1 y  to  Mr.  Al  Korb  for  the 
loan  of  4-speed  Talking  Book  machines  and  variable  playback-control  units  for 
use  in  th  i  s  study . 

Special  acknowledgement  is  due  to  Mrs .Rose  Diamond,  Project  D i rector , 
who  took  capable  charge  of  the  innumerable  details  and  problems  of  this  three- 
month  project,  and  carried  it  through  on  time  to  a  successful  conclusion. 
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Special  credit  is  also  due  our  Recording  Engineer,  Mr.  Charles  Shipley,  for 
directing  the  recording  operations,  and  providing  us  with  the  required  records 
as  needed.   We  want  to  express  thanks  also  to  Mr.  George  Wolman  of  Universal 
Recording  Studios  of  Chicago  for  his  cooperation  in  compressing  the  material 
as  a  service  to  the  Hadley  School. 

As  a  follow-up  to  this  study,  and  particularly  in  consideration  of 
its  favorable  results,  the  Hadley  School  is  planning  to  apply  speech  compression 
in  the  area  of  20% — 30%  to  select  future  courses,  as  recorded,  in  order  to 
set  in  motion  long-range  field  tests  that  may  further  evaluate  the  practicality 
of  speech  compression  as  applied  to  recorded  textual  materials  used  in  formal 
academic  studies. 

It  should  be  added  that  the  project  has  emphasized  the  fact  that  the 
real  question  in  the  application  of  speech  compression  to  academic  study  is 
not  one  of  speed  of  reading,  for  many  students  can  apparently  adapt  to  extremely 
rapid  playback,  even  up  to  500  words  a  minute.   The  real  problem  seems  to  be 
what  the  present  method  of  compression  does  to  the  quality  of  the  recording. 
The  evenly  spaced  clipping  of  segments  of  a  recording  produces  a  random 
clipping  of  the  irregularly  spaced  recorded  sounds.   Repeatedly  the  axe  falls 
upon  a  short  but  important  consonant,  with  unfortunate  results,  as  when  "student" 
turns  into  "sudent",  or  "statistics"  into  "stat iss ics". 

Your  professional  reader  repeatedly  gets  tongue-tied,  and  if  the 
reader  is  a  top  one  the  effect  is  startling,  comic,  insulting. 

Some  method  of  speech  compression  which  will  speed  up  the  reading  not 
only  without  a  change  in  pitch  of  the  voice,  but  also  without  distortion,  needs 
to  be  developed  as  soon  as  possible  in  order  to  assure  the  continued  acceptance 
and  application  of  compressed  speech. 

Donald  Wing  Hathaway 
Executive  Director  and 
Director  of  Education 
The  Hadley  School  for  the  Blind 
September,  1966 

*And ,  now  I  am  going  to  add  a  few  words  of  my  own  and  if  you  say,  "Isn't 
that  rather  strange  that  a  man  who  has  not  heard  a  word  of  spoken  speech  compressed 
or  otherwise,  for  23  years  should  be  talking  on  this  subject?"   If  you  do  say 
that,  I  reply  this:   who  could  possibly  have  a  greater  appreciation  of  the  spoken 
word  and  its  meaning?   I  haven't  listened  in  detail  to  compressed  speech,  I 
haven't  even  been  primarily  involved  in  compiling  and  analyzing  statistics. 
Specialities  are  not  my  speciality.   But  in  the  silence  of  hearing  no  speech, 
and  in  remembering  the  days  in  my  childhood  when  I  could  hear,  I  have  had  a 
good  deal  of  silence  in  which  to  think  and  I  would  ask  you  this  question. 
Gentlemen  and  Ladies:   At  what  rate  does  the  visual  reader  read?   Now  I 
don't  mean  how  many  words  a  minute  but  at  what  principles  rather  govern 
the  rate  of  his  reading?   Surely  his  reading  speed  will  depend  on  the 
intelligence,  the  training,  the  practice,  and  the  motivation  of  the  reader. 
And  even  where  one  person  is  concerned,  for  we  all  agree  that  the  foregoing 
factors  vary  from  individual  to  individual,  but  where  even  one  single  person 
is  concerned,  the  reading  rate  will  vary  for  the  visual  reader.   After  all, 
not  even  a  scholar  is  going  to  read  Plato,  the  sports  page,  and  the  23rd 
Psalm  at  the  same  rate.  These  problems  in  varibility  are  easily  solved  for 
the  visual  reader  because  he  himself  is  his  own  speech  compressor-expansor 
unit.   He  simply  reads  faster  or  slower  according  to  the  need  of  the  moment. 


''Editor's  Note-Upon  completion  of  his  reading  of  "The  Place  of  Speech 
Compression  in  Academic  Study",  Dr.  Kinney  added  the  following  remarks  of  his 
own. 
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Now  my  thought  is  this.   Ultimately,  what  we  need  in  this  field  is  a  device 
that  will  give  the  same  options  to  the  auditory  reader.   A  device  that  will 
lift  him  to  the  plain  of  the  visual  reader  with  all  the  visual  reader's 
freedom  of  choice.   This  could  be  done,  I  believe,  only  through  an  individual 
variable  compressor-expansor  unit  for  playing  back  recorded  speech.   In  other 
words,  a  talking  book  or  a  tape  recorder  that  contains  its  own  personal  compressor- 
expansor  so  that  the  listener  could  speed  up  or  slow  down  the  speech  to  which 
he  is  listening  according  to  his  need  just  as  the  visual  reader  unconsciously 
does.   Now,  I  am  aware  that  in  our  present  state  of  knowledge  this  may  seem 
a  terrific  assignment.   But  I  also  know  that  challenges  are   what  we  live  by, 
when  we  are  living  at  our  best.   I  like  the  slogan  of  the  Army  Engineers, 
"The  difficult  we  do  at  once,  the  impossible  takes  a  little  longer."  We 
are  willing  to  give  you  a  little  longer  but  we  remind  you  engineers,  that 
problems  are  opportunities.   An  individual  variable  compressor-expansor 
unit  for  playing  back  recorded  speech-- there ' s  your  problem--on ly  you 
can  turn  it  into  an  opportunity. 
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CONTROL  GROUP 

(Regular  Record  Used) 

Side  l--recorded  and  played  back  at  33-1/3  RPM. 

Side  2--recorded  and  played  back  at  16-2/3  RPM. 

This  record  contained  one  unit  of  a  cor respondence-- 
study  course  in  United  States  Citizenship  recorded  at  norma 
reading  speed  (160  words  per  minute)  and  played  back  at 
normal  speed.   It  is  the  group  on  which  all  comparisons 
with  the  test  groups  will  be  based. 


CONTROL  GROUP 
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AGE     STUDENT     EDUCATIONAL  LEVEL     GRADES 

TesTTl   Test  #2 


EVALUATION 


15 


Below  high  school     ^0 


50 


Should  have  been  narrated 
by  a  man  instead  of  a 
woman . 


15 


High  School 


80 


50 


None 


15 


High  School 


50 


60 


Person  who  had  brief 
course  in  history  could 
understand  it  in  sixth 
grade . 


19 


Freshman  High  School   80 


ko 


Completed  project  in 
two  hours. 


25 


High  Schoo 


90 


90 


Material  easy  to  under- 
stand-would  have  liked 
the  read  i  ng  faster  . 


28 


High  School  students   60 
as  we  1 1  as  adu 1 ts  or 
col  lege  students 
could  understand. 


70 


Prefers  reading  such 
material  from  braille 
books  . 
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High  School 


90 


60       Side  one  was  easier  to 

understand  than  side  two 
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High  School 


60 


80       Material  was  easy  to 
understand . 


34 


Col  lege  or  Adul t 


ko 


ko 


None 


10 


First  year  college    kO 


50  Easy  to  understand  but 
too  much  to  thoroughly 
absorb  in  such  limited 
t  ime. 


CONTROL  GROUP 
AGE    STUDENT    EDUCATIONAL  LEVEL       GRADES 


Test  #1   Test  #2 


EVALUATION 


^3 


11      H  igh  School  or 

poss  ib le  Col  lege 


70 


60     Student  feels  a  good  memory 
and  understanding  of  what 
one  hears  is  necessary  for 
this  experiment. 


kk 


12      High  School 


30 


80     Questions  too  fast  for 
b  1  i  nd  person . 


51 


13 


Eighth  Grade 


50 


00     Lessons  read  too  fast  and 
couldn't  grasp  content. 
Questions  read  too  fast. 


Sk 


)k  High  School 


100 


70 


None 


55 


15      Late  High  School  or    90 
f  i  rst  year  Col  lege 


90     Suitable  for  adults  who  have 
not  finished  high  school. 
Material  easy  to  understand. 


55 


16 


High  School  or  less    90 


70 


Found  material  very  simple 


17      High  School 


70 


80     Does  not  think  all  information 
can  be  absorbed  in  such  a 
short  period  of  time  without 
listening  to  record  a  few  more 
t  i  me  s  . 


62 


18 


Col  lege 


80 


70     Questions  read  too  rapidly 
without  sufficient  pause 
between  questions. 


15 


19 


None 


60 


50 


None 


29 


20 


None 


10 


50 


None 
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TEST   GROUP   "A" 

Side    l--recorded    and    played    back  at    33-1/3    RPM„ 

Side   2--recorded    and    played    back  at    16-2/3    RPM„ 

This  record  contained  one  unit  of  a  correspondence-- 
study  course  in  United  States  Citizenship  compressed 
to  250  words  per  minute  and  listened  to  at  correct 
playback  speed,  as  indicated  above. 
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AGE 


STUDENT 


TEST  GROUP  "A" 

EDUCATIONAL  LEVEL        GRADES 

Test  #1   Test  #2 


EVALUATION 


16 


21      Test  #1-High  School     70 
Test  #2-8th  Grade 


70      Prefers  faster  reading 
because  it  saves  time. 
Found  side  two  easier  to 
understand  than  side  one. 


29 


22 


High  Schoo 


70 


None 


Easy  to  understand. 


30 


23      High  School  and 
Adult  Review 


70 


50      At  the  beginning  student 
missed  some  words  com- 
plete ly--found  statistics 
confusing  because  of 
rapid  speech.   Could 
have  concentrated  better 
if  read  slower.   Second 
side  more  understandable. 


2k  High  School  or 

last  grade  or  two 
of  grade  school . 


70 


80      Once  one  becomes  accustomed 
to  rapid  speech  and  can 
ignore  words  slurred  over 
you  can  concentrate  on  the 
important  material. 
Believes  it  could  easily 
be  adopted  as  a  teaching 
method . 


32 


25 


High  School 


80 


60      Reading  speed  a  bit 

"clipped11  but  not  too 
difficult  to  understand. 
Second  half  of  text  could 
have  been  faster  without 
losing  comprehension. 


33 


26 


High  Schoo 


50 


70      Increased  speed  calls  for 

intense  concentration  which 
older  people  would  find 
difficult.   One  must  listen 
to  text  more  than  once. 
This  method  of  teaching 
would  be  especially  good 
for  high  school  students. 
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27 


High  School 


60 


50      Requires  deep  concentration. 
Didn't  understand  some  of  the 
words  because  syllables  were 
left  out  and  words  slurred 
over . 


TEST  GROUP  "A" 
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AGE    STUDENT    EDUCATIONAL  LEVEL 


GRADES 
Test  #1   Test  #2 


EVALUATION 


k5 


28      Advanced  High  School    90 
or  Adu 1 1  Educat  ion 


90    See  Letter 


5k 


29 


High  Schoo 


80 


50    Compressed  speech  too 

fast.   Questions  delivered 
too  fast. 


55 


30 


High  School 


80 


60    Test  easy-questions 

difficult  at  recorded 
speed  . 


69 


31 


High  School 


70 


50    Reading  speed  o.k.  but 

feels  it  would  be  necessary 
to  listen  more  than  twice 
to  absorb  material.   Not 
enough  time  between 
quest  ions . 


TEST  GROUP  "B1 


(Compressed  Record  Used) 


Side  1-recorded  at  33-1/3  RPM  and  played  back  at  45  RPM. 

Side  2-recorded  at  16-2/3  RPM  and  played  back  at  2k   RPM 

or  33-1/3  RPM;,  if  poss  i b  1  e „ 

This  record  contained  a  compressed  reading  of  one  unit 
of  a  correspondence-study  course  in  UNITED  STATES 
CITIZENSHIP  and  student  played  it  back  at  faster  than 
normal  playback  speed.   in  other  words,  what  is  technically 
known  as  rapid  speech  (faster  than  normal  playback)  was 
superimposed  upon  compressed  speecn.   The  result  was 
reading  speeds  of  from  300  to  ^50  words  per  minute, 
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AGE 


TEST  GROUP  "B" 

STUDENT    EDUCATIONAL  LEVEL        GRADES 

Test  #1   Test  #2 


EVALUATION 


32 


36 


High  School 


50 


50 


Feels  if  material  was 
narrated  by  a  man  instead 
of  a  woman  it  would  have 
been  a  little  easier  to 
understand.   At  24  rpm 
woman's  voice  is  hard 
to  understand  —  at  33  1/3 
rpm  impossible  to  under- 
stand . 


16 

33 

Senior  High  School 

90 

80 

See  Letter 

16 

34 

Upper  High  School 

50 

30 

Questions  were  easy. 
Could  understand  at 
fast  speed  very  easi 

'y- 

18 

35 

High  School 

50 

30 

See  Letter 

Adu  1 1  Educat  ion 


To 


70 


To 


Would  be  useful  to  a  busy 
person  who  must  decipher 
a  great  mass  of  material 
qu  i  ckl y- 1 i  ke  a  b 1 i  nd 
1 awyer .   1 1  mi  ght  be 
necessary  to  listen  over 
again  on  denser  written 
matter,  such  as  science, 
which  must  be  carefully 
digested.   Although  the 
system  is  intriguing  and 
its  way  useful,  student 
does  not  1  i  ke  it. 


20 


37 


Upper  High  School 


50 


Found  test  frustrating- 
student  feels  she  would 
have  to  gradually  get 
accustomed  to  the 
rap  id  i  ty  and  h  i  gher 
p  i  tch 


23 


38 


None 


None     None     Cancelled  out.   Found  it 

impossible  to  understand 
more  than  two  or  three 
words  here  and  there. 
Cou Idn ' t  d  i  st  i  ngu  i  sh 
quest  ions  at  all. 
Listened  to  side  one  at 

45  rpm. 
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TEST  GROUP  "B" 

AGE    STUDENT    EDUCATIONAL  LEVEL        GRADES 

Test  #1   Test  #2 


EVALUATION 


36 


39 


High  School 


70 


70 


Side  one  easy  to  under- 
stand—side two  more 
d  if f  icu 1 1 .   Student 
believes  that  individual 
must  have  good  concen- 
trat  ion  hab  i  ts ,  no 
interruptions  and  an 
interest  in  the  subject 
for  compressed  speech  to 
be  successf u 1 . 


38 


40 


High  School 


70 


70 


See  Letter 


43 


41  Upper  High  School 
and  Undergraduate 
Leve  1  . 


60 


70 


Played  side  one  at  45  RPM 
and  s  ide  2  at  24  RPM  and 
found  them  comprehendab 1 e 
at  these  speeds.   Side 
two  at  33-1/3  RPM  was 
very  difficult  to  follow. 
Feels  compressed  speech 
could  benefit  handicapped 
students  with  quick  cover- 
age of  materials  by 
requiring  deeper  concen- 
tration allowing  easier 
comprehens  ion . 
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42 


Adult  Education 


50 


70 


See  Letter 


55 


43 


Adult  Education 


80 


80 


See  Letter 


63 


44 


None 


None    None      Cancelled  out.   Found 

readers  talked  too  fast 
for  student  to  understand 
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LETTERS  FROM  PARTICIPANTS 
GROUP  A 
Student  #28  Lesson  #1--Grade  90  Lesson  #2--Grade  90 

Having  completed  and  returned  both  of  my  lessons  on  the  Compressed 
Speech  Project,  I  should  now  like  to  submit  my  report: 

I  feel  that  there  are    things  to  be  said  for  and  against  this  method 
of  recording:   It  seems  to  me  that  this  would  be  a  fine  way  of  helping  students 
to  cover  a  great  deal  of  material  in  a  minimum  of  time.   The  speech  is  perfectly 
intelligible,  even  when  the  recording  is  played  at  a  slightly  higher  rate  of 
speed  than  that  at  which  the  lessons  were  originally  recorded.   The  accelerated 
rate  compels  the  student  to  give  his  full  attention  to  the  reader,  and  it  is  my 
opinion  that  those  with  good  powers  of  concentration  and  a  retentive  memory 
ought  to  do  well  when  studying  by  this  method. 

On  the  other  hand,  those  like  myself,  who  may  require  a  little  more 
time  for  deliberation  while  reading,  may  find  that  vital  information  has  escaped 
them  while  they  were  taking  time  to  ponder  a  point.   I  feel  that  I  might  have 
done  better  on  the  lessons  if  I  had  had  more  time  to  digest  the  many  facts 
which  were  delivered  to  me  so  rapidly.   I  should  imagine,  however,  that  a  person 
might  gradually  accustom  himself  to  absorbing  material  more  quickly  and  could 
eventually  train  himself  to  be  much  more  efficient  in  the  use  of  these  recordings. 

In  my  judgment  the  particular  lessons  used  for  the  experiment  would 
be  suitable  for  an  advanced  high  school  course  or  for  adult  education. 

Thank  you  so  much  for  inviting  me  to  participate  in  this  project.   I 
have  greatly  enjoyed  doing  the  necessary  work,  and  I  do  hope  that  my  small 
efforts  will  be  of  benefit. 

GROUP  B 

Student  #33  Lesson  #1--Grade  90  Lesson  #2--Grade  80 

I  hope  that  I  can  successfully  cover  all  of  the  areas  of  the  compressed 
speech  program  that  I  wish  to  cover.   I  consider  this  material  on  a  level  of 
Senior  High  School,  or  maybe  background  for  a  freshman  government  course  in 
college.   The  material,  therefore  was  not  new  to  me,  though  it  was  indeed  necessary 
for  me  to  concentrate  on  it,  as  I'd  forgotten  most  of  the  important  facts,  or  could 
not  readily  call  them  to  mind. 

I  think,  first  of  all,  that  the  reader  in  such  a  course  should  be  a 
man.   To  speed  a  woman's  voice  up  from  R.P.M.  16  to  R.P.M.  33  is  very  difficult. 
The  fact  that  the  tone  goes  up  an  entire  octave  makes  much  more  of  a  difference 
when  the  voice  is  already  of  a  high  quality.   It  is  easier  to  concentrate,  and 
more  readily  comprehendab 1 e  when  the  voice  is  low-pitched. 

It  was  possible  to  compare  how  certain  words  sounded  in  relation  to 
one  another.   At  one  point,  the  word  "each"  was  pronounced  in  the  usual  fashion, 
while  at  other  times  it  was  pronounced  "eash".   I  was  not  surprised,  therefore 
to  find  that  the  compression  was  done  at  random.   I  was  a  little  disconcerted 
upon  receiving  the  speech  record.   I  imagined  that  the  vowel  sounds  would  be 
cut  and  the  consonants,  with  the  exception  of  S  or  2  or  J,  which  have  extended 
sounds,  would  be  left  intact.   It  was  a  disappointment  to  realize  that  the  compression 
was  not  planned,  but  random.   It  was  not  uniform  and  thus  called  attention  to 
itself.   Upon  slowing  the  speech  down  to  normal  speed,  I  found  that  it  did  not 
sound  as  good  as  it  did  when  played  at  either  one-half  speed  or  double  speed. 
There  were  breaks  in  the  sound  of  the  recording,  not  in  the  speaker's  voice  as 
much  as  in  the  sound  of  the  record  itself.   Because  of  the  random  cuttings, 
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some  consonants  were  eliminated  entirely.   This  phenomenon  was  infrequent, 
and  never  very  noticeable,  but  it  was  possible  to  tell  upon  concentration. 

1  think  that  compressed  speech,  plus  a  continuous  variable  speed 
lever  would  be  the  best  thing  for  the  Talking  Book  user,  especially  a 
student  whose  time,  especially  reading  time,  would  be  minimized  greatly. 
However,  in  order  to  be  really  usable,  it  would  be  better  to  have  the 
recording  made  with  deliberate  splicing  out  of  long  vowel  sounds,  considerable 
breaks  in  speech  for  instance  at  the  end  of  a  sentence,  prior  to  the  beginning 
of  the  next  one.   It  would  be  helpful,  also,  to  leave  uncut  certain  portions 
of  material;  vocabulary  words,  spelled  portions  of  the  text,  names  or  other 
material  which  is  important  for  its  mechanics  or  some  other  factor. 

Although  this  report  may  seem   rather  critical,  considered 
altogether,  my  criticisms  would  not  constitute  opposition  or  dislike  of 
the  recording  at  all.   I  am  only  trying  to  be  as  critical  as  possible  so 
that  every  possible  bug  may  be  eliminated  from  this  infant  program.   I 
hope  that  my  observations,  as  well  as  my  participation  in  this  program  has 
been  of  some  help  to  both   Hadley  School  and  the  Library  of  Congress.   If 
there  is  anything  that  I  failed  to  cover,  or  if  further  comments  would  be 
desired,  feel  free  to  contact  me.   You  might  also  let  me  know  the  outcome 
and  the  decisions  made  as  a  result  of  this  experimentation.   Thank  you  very 
much  for  the  opportunity  to  participate  in  such  a  project. 

GROUP  B 

Student  #35  Lesson  #1--Grade  50  Lesson  #2--Grade  30 

I  think  this  course  should  be  given  on  the  high  school  level. 
The  exercises  in  vocabulary  building,  I  think,  are  a  trifle  below  high  school 
level,  but  the  material  presented  would  be  interesting  to  high  school  students. 

Contrary  to  what  other  people  think,  I  have  found  that  the  faster 
a  thing  is  presented,  the  more  I  get  out  of  it.   I  think  that  this  is 
because,  in  order  to  grasp  everything  that  is  said,  one's  concentration 
must  be  focused  on  what  is  being  said.   This  may  not  be  true  for  all  people, 
but  it  has  been  so  with  me. 

For  me,  at  least,  this  only  applies  to  textbooks.  My  interest 
in  literature  is  generally  keen,  no  matter  at  what  speed  the  material  is 
presented.  However,  when  the  material  is  less  interesting,  I  have  found 
that  a  faster  rate  of  speed  helps  me  to  concentrate. 

GROUP  B 

Student  #k0  Lesson  #1--Grade  70  Lesson  #2--Grade  70 

Thank  you  for  allowing  me  to  participate  in  the  Compressed  Speech 
Program.   I  hope  that  my  findings  and  opinions  will  prove  helpful. 

After  carefully  following  the  instructions  given  for  this  experiment, 
I  must  report  that  my  success  in  understanding  the  recorded  material  was 
very  limited.   So  limited,  in  fact,  that  I  had  to  rely  on  a  small  amount 
of  knowledge  and  a  large  amount  of  conjecture  in  preparing  my  examination 
reports . 

Starting  with  side  one  of  the  record,  I  found  that  the  speech 
was,  for  the  most  part,  intelligible  when  heard  at  the  speed  of  45  R.P.M. 
However,  the  concentration  necessary  for  understanding  and  the  rapidity 
of  the  delivery  left  me  little  opportunity  to  actually  learn  much  from  the 
text.   This  was  also  true  for  the  second  side  of  the  record,  with  one  exception 
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When  listened  to  at  2k   R.P.M.,  the  speech  on  side  two  was  less  intelligible. 
(According  to  the  smaller  R.P.M.  difference,  side  two  should  have  been 
easier  to  understand  than  side  one.)   Needless  to  say,  the  33-1/3  R.P.M. 
speed  for  side  two  was  entirely  too  fast  for  me  to  understand  anything. 
Understandable  or  not,  the  material  played  back  at  faster  than  normal 
speeds  provided  difficult  listening. 

Judging  by  the  caliber  of  the  examination  questions  and  by 
what  little  information  I  was  able  to  pick  up  from  the  record,  I  would 
place  this  course  compressed  speech  included--at  the  high  school  level. 
I  think  that  students  in  the  high  school  age  group  have  the  best  chance 
of  mastering  the  teaching  innovation.   Of  course,  given  suitable  subjects, 
this  might  also  be  said  for  those  in  the  college  age  group. 

I  have  very  much  enjoyed  taking  part  in  this  project.   I  hope 
the  above  information  will  be  satisfactory.   If,  however,  I  have  overlooked 
something,  please  let  me  know. 

GROUP  B 

Student  #k2  Lesson  #1   Grade  50  Lesson  #2--Grade  70 

!  found  the  material  easy  enough  to  understand,  but  extremely 
unpleasant  to  listen  to  at  the  fast  speed,   I  mean  by  that,  that  the 
material  was  easy  to  understand,  but  difficult  at  the  fast  speed.   For 
side  two  I  used  the  2k   R.P.M.  for  the  second  reading.   I  should  evaluate 
the  course  as  (b)   Adult  Education. 

I  should  like  to  comment  in  general  that  I  did  not  enjoy  the  fast 
speeds,  the  33  R.P.M,  being  more  than  difficult.   If  I  were  a  student,  I 
think  I  should  not  study  well  at  such  speeds  because  they  do  not  allow  for 
note  taking  and  because  such  intense  attention  is  required  merely  to 
understand  the  subject  that  none  is  left  for  the  substance. 

GROUP  B 

Student  #43  Lesson  #1--Grade  80  Lesson  #2--Grade  80 

In  line  with  your  instructions  for  the  Compressed  Speech  Program, 
I  am  sending  this  letter  of  evaluation. 

In  my  opinion  the  material  is  at  the  adult  education  level. 

Since  there  are  no  experts  in  the  Compressed  Speech  field,  I  feel 
that  a  few  observations  and  suggestions  based  on  a  half  century  of  listening 
might  not  be  out  of  order. 

The  Talking  Book  has  been  a  source  of  enjoyment  and  education 
to  me  since  the  days  when  it  was  necessary  for  one  to  buy  his  own  machine. 
When  combing  material  in  search  of  certain  facts,  I  have  often  wished  that 
the  record  could  be  speeded  up  in  order  that  so  much  time  would  not  be 
consumed.   The  only  speed  change  available  was  double  and  that  is  impossible 
even  for  skimming.   I  experimented  with  other  machines  and  found  that  a 
33  R.P.M.  record  could  be  played  at  k5    R.P.M.  and  still  be  easily  understood. 
The  same  is  true  when  a  16  R.P.M.  is  played  at  2k   R.P.M.  From  the  above  it 
seems  that  if  a  machine  were  available  with  constantly  variable  speed  one 
might  cover  more  ground  in  a  shorter  time.   There  is  one  drawback  to  this, 
however,  and  it  is  this:   one  often  passes  the  wanted  fact  and  desires  to 
go  back  and  get  it.   There's  the  rub.   With  a  disc  there  is  no  going  back 
without  (1.)  setting  the  tone  arm  back  too  far  and  losing  more  time  and 
(2.)  which  is  more  important  ruining  the  record  for  the  next  user.   This 
leads  us  logically  to  tape,  but  not  on  reels.   People  handle  tape  as  carelessly 
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as  they  handle  records.   In  my  opinion,  the  ultimate  system  would  be  a 
tape  cartridge  machine  which  could  be  used  both  for  entertainment  and 
study.   It  would  be  possible  to  reverse  tape  direction  and  change  speed 
of  tape. 

I  had  no  difficulty  in  listening  to  the  compressed  speech  when 
it  was  played  at  the  normal  speed  of  the  record  but  at  higher  speeds  too 
many  words  are  lost,  not  because  of  the  compression  but  the  combination 
of  both. 

I  should  be  happy  to  do  anything  which  would  further  this  kind 
of  work. 


5h 
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CHAPTER  VI 

"Use  Of  Compressed  Speech,  Tapes  And  Discs  And  of  Variable 
Frequency  Power  Supply  With  Selected 
Chi Idren  And  Adults" 


Josephine  L.  Taylors- 
There  are    several  hundred  children  and  youth  in  New  Jersey  who  are  using 
disc  or  tape  recorded  materials  for  part  of  their  school-required  reading.   Since 
tape  recorders  have  been  available  on  the  American  Printing  House  for  the  Blind 
Quota  Account  and  since  the  purchase  of  a  tape  duplication  system,  through  which 
it  is  possible  to  make  multiple  copies  rapidly  and  also  to  exchange  recording  speeds, 
the  use  of  tapes  has  far  exceeded  disc  recordings  in  popularity. 

During  the  past  six  months,  two  "trial-run"  projects  were  attempted-- 
one  pertaining  to  compressed  speech  and  the  other  concerning  variable  frequency 
power  supply.   These  might  better  be  called  reactions  to  small  samplings  of 
compressed  speech  or  variable  frequency  power  supply  by  a  limited  number  of 
participants.   It  will  be  evident  that  this  is  a  report  of  the  above  nature 
and  not  a  research  project  or  a  pilot  study  or  even  a  study.   The  number  of 
participants  in  the  trial-run  was  limited  by  an  effort  to  coordinate  the  textbook 
needs  of  students  in  New  Jersey;  Atlanta,  Georgia;  and  Nashville,  Tennessee 
and  the  Tennessee  School  for  the  Blind  and  by  the  late  acquisition  of  necessary 
materials  as  well  as  other  problems  of  human  and  mechanical  frailties.   However, 
the  trial-run  has  shown  a  desire  on  the  part  of  the  readers  for  more  experience 
with  compressed  speech  at  various  speeds  and  more  experience  with  variable 
frequency  power  supply  in  use  with  tapes  and  discs. 

The  first  part  of  the  project  involved  two  textbooks  used  by  New  Jersey 
students  and  those  in  other  educational  programs  participating.   These  were 
compressed  to  275  wpm  by  Dr.  Emerson  Fou 1 ke  through  an  agreement  with  the 
Library  of  Congress.   Of  the  seven  New  Jersey  students  planned  for  the  first 
section,  three  participated;  (one  moved  to  another  state,  one  received  the 
materials  too  late--the  teacher  used  the  latter  part  of  the  book  earlier  in 
the  year,  one  never  got  to  that  section  of  the  textbook,  one  dislikes  tapes 
and  preferred  to  struggle  along  with  very  slow  print  reading).   The  book 
compressed  for  this  group  was  History  of  Our  World  by  Boak,  et  al;  Houghton, 
Mifflin  Co.,  1961.   The  remaining  three  students  wrote  as  follows:   Janey,  a 
fourteen-year-old  ninth  grader  with  10/200  vision  and  superior  intelligence 
wrote  as  f ol lows : 

"Last  Spring  I  received  compressed  speed  tapes  of  my  history  book 
to  compare  with  normal  tapes.   I  found  that  I  could  get  just  as 
much  from  the  fast  tapes  in  much  less  time.   They  were  most  helpful 
for  review  at  examination  time." 

Howie,  a  fifteen-year-old  tenth  grader,  totally  blind,  with  very  superior 
i  ntel 1 i  gence  wrote : 

"Now  that  school  is  over  and  I  am  finished  with  my  History  tapes,  1 
am  sending  you  my  opinion  of  the  275  wpm  tape.   I  think  they  are 
marvelous,  and  they  cut  down  on  the  reading  time.   They  are    like 
skimming,  because  you  don't  have  to  listen  to  the  small  words,  you  just 
have  to  listen  to  the  bigger  ones  and  they  make  sense.   I  only  have 
two  complaints  about  the  tape. 


-Miss  Josephine  L.  Taylor  is  Director  of  Educational  Services  for 
the  New  Jersey  State  Commission  for  the  Blind,  Newark,  New  Jersey,  08102. 
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First  one  is  that  sometimes  the  reader  could  not  spell  the  word, 
and  they  were  not  always  clear  enough  to  get  an  idea  of  what  the 
word  should  be  spelled  like.   I  think  that  whenever  possible  the 
reader  should  spell  the  new  word.   My  second  complaint  is  that  the 
numbers  on  the  tape  did  not  always  coincide  with  the  numbers  on 
the  labels  of  the  tapes.   This  became  rather  confusing. 

Those  were  the  only  two  complaints  I  had,  and  I  think  that  this 
is  a  stepping  stone  into  the  reading  future  of  the  blind." 

Priscilla,  a  fourteen-year-old  ninth  grader  with  20/200  vision  and  superior 
intelligence,  wrote  as  follows: 

"I  think  you  will  be  happy  to  learn  I  do  favor  the  compressed 
tapes  to  the  slow  speed  tapes.   I  have  found  the  slow  speed  discs 
which  I  read  for  school  that  my  mind  wanders  because  of  the  slowness 
of  the  reader  and  all  the  time  he  spends  talking  about  charts  and 
pictureSo   With  the  compressed  tapes  I  have  found  that  I  am  forced 
to  listen  in  order  to  understand  what  the  reader  is  saying.   You 
sent  me  the  compressed  tapes  for  my  history  book  last  year.   You 
may  want  to  take  into  consideration  that  I  like  History  very  much 
and  also  that  my  history  book  wasn't  as  difficult  reading  as  maybe 
English  or  Science  might  have  been.   I  don't  think  it  would  be  good 
to  record  compressed  tapes  for  English,  especially  books  with 
colorful  vocabulary.   However,  i  cannot  really  say  because  I  haven't 
tried  these  yet. 

If  it  were  possible,  I  think  it's  a  good  idea  to  record  the  tapes 
in  such  a  way  that  they  could  be  slowed  down  in  order  to  take  notes 
or  re-read  certain  materials. 

Thank  you  for  allowing  me  to  be  part  of  your  experiment.   I  am 
looking  forward  to  receiving  compressed  tapes  again." 

From  the  second  textbook  that  was  compressed  to  275  wpm  was  Adventures 
in  Apprec  iat  ion  by  Loban ,  1958  edition.   Of  the  five  students  who  were  selected 
for  this  part  of  the  trial-run,  one  moved  out  of  state  and  one  received  the 
section  too  late  (teacher  skipped  around  in  the  book).   The  remaining  three 
reported  as  follows:   Lisa,  a  fifteen-year-old  totally  blind  ninth  grade 
student  with  high  average  intelligence: 

"I  am  writing  to  answer  your  letter  about  the  tapes  with 

the  compressed  speech.   I  thought  the  tapes  were  easy  to 

read.   I  liked  them  because  they  weren't  much  harder  to  read  than 

the  regular  tapes  but  they  shortened  your  reading  time.   I  think 

I  would  like  to  try  tapes  which  are  in  a  more  compressed  form." 

Because  of  the  last  comment,  Lisa  was  selected  for  another  trial-run  project 
described  later  in  this  paper. 

Rachael,  a  totally  blind  fifteen-year-old  ninth  grade  student  with  average 
intelligence  did  not  surprise  us  with  her  reaction  since  she  is  one  who 
resists  change,  does  not  accept  orientation  and  mobility  instruction  nor 
join  in  extra-curriculum  activities  especially  if  suggested  by  someone  else; 
spends  hours,  perhaps  too  many,  on  braille  and  talking  book  reading.   Rachael 
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resists  any  change  from  a  highly  structured  environment.   Her  comments 
were  as  fol lows : 

"I  have  listened  to  the  compressed  speech  tape  of  Adventures 
i  n  Read  i  ng .   I  found  the  cutting  off  of  parts  of  the  words 
very  distracting.   It  made  the  reading  uninteresting  to  listen 
to.   Even  though  the  regular  tape  takes  longer  to  listen  to, 
I  prefer  the  regular  to  the  compressed  speech.   I  often  had  to 
replay  parts  of  the  tape  because  of  the  fast  reading.   Therefore, 
I  doubt  if  much  time  is  saved  by  this  method.   I  hope  my  comments 
are   of  some  help  to  you." 

Mary,  a  fourteen-year-old  ninth  grader,  totally  blind  with  high  average 
intelligence  and  also  a  rather  highly  structured  girl,  wrote: 

"I  felt  I  had  no  real  trouble  understanding  the  tapes  at  this 
speed  but  I  wouldn't  want  them  recorded  at  any  faster  rate  of 
speed  . " 

Three  June  high  school  graduates,  all  college  bound,  happened  to 
be  working  at  the  agency  offices  as  part  of  our  summer  student-employment 
program.   Since  they  were  eager  to  evaluate  compressed  speech,  we  arranged 
for  them  to  use  the  Library  of  Congress  distribution  entitled,  "Conditioning 
Record  of  Compressed  Speech-Selected  Stories  of  O'Henry  by  W.  S.  Porter". 
This  is  a  talking  book  record  that  starts  at  165  words  per  minute  and  by 
approximately  2  1/2  minute  intervals,  moves  up  to  275  words  per  minute. 
Each  student  wrote  a  short  paragraph  evaluating  the  disc.   Sam,  age  17, 
wrote  as  fol lows : 

"The  first  two  rates  of  speed  were  relatively  easy  to  comprehend. 

At  the  third  and  fastest  speed,  I  could  still  comprehend  what  was 

being  said.   However,  one  must  give  his  strictest  attention  to 

the  speaker  without  having  any  distraction  whatsoever.   When 

listening  to  talking  books  in  a  normal  environment,  with  occasional 

sounds,  one  must  keep  his  ears  acutely  tuned  to  the  speaker.   At 

that  fast  rate  of  speed,  intense  concentration  is  required.   I 

would  think  that  the  average  person's  mind  is  apt  to  wander  and 

he  would  miss  most  of  the  translation.   He  would  at  least  miscomprehend 

some  of  the  wording  and,  therefore,  miss  the  drift  of  the  sentence." 

Helen,  also  totally  blind,  and  also  a  very  superior  student,  (valedictorian 
of  her  public  high  school  class)  wrote  as  follows: 

"I  understood  the  first  story  and  the  first  half  of  the  second 
story  without  effort.   The  last  half  of  the  second  story  required 
a  little  more  concentration.   In  the  first  half  of  the  third  story, 
some  words  were  hard  to  understand  but  were  understandable  on  second 
reading.   At  all  times  during  the  first  reading  of  this  portion, 
it  is  possible  to  understand  the  main  idea  of  all  sentences.   The 
last  half  of  the  third  story  required  close  attention;  some  words 
and  phrases  unclear,  but  recognizable  upon  second  reading;  main 
idea  of  most  sentences  clear  when  first  read.   At  no  time  was 
anything  on  the  record  impossible  to  understand." 
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The  third  student  has  10/200  vision  and  reads  ordinary  print  with  special 
lenses.   She  also  is  a  very  superior  student,  being  both  a  national  scholar 
and  a  presidential  scholar/  Her  comments  are  as  follow: 

"Throughout  the  first  story  and  part  of  the  second,  I  noticed 
no  marked  change.   By  the  end  of  the  second  story  and  the 
beginning  of  the  third,  there  is  a  definite  change  but  the 
text  was  still  very  understandable  without  extra  concentration. 
However,  I  did  notice,  having  listened  for  awhile  and  come  back 
to  it,  that  the  change  was  more  noticeable,  and  it  was  a  bit 
harder  to  get  used  to  the  more  clipped  (or  compressed)  text  all 
at  oqce  than  it  had  been  to  get  used  to  it  gradually.   However, 
there  was  still  little  problem.   By  the  end  of  the  record,  the 
text  was  quite  clipped  and  moving  rather  fast.   It  required 
quite  a  bit  of  concentration,  but  was  still  understandable, 
though  it  might  be  hard  to  tell  if  the  entirety  of  it  would 
sink  in  under  ordinary  circumstances,  or  if  one  could  get  used 
to  the  text  this  fast  all  at  once," 

It  should  be  noted  that  the  students  who  reacted  to  the  compressed 
speech  tapes  and  discs  were  all  academically  oriented.   This  occurred, 
not  through  deliberate  selection,  but  through  choice  of  textbooks  and 
availability  of  students.   However,  based  on  our  experience  with  this 
limited  group,  we  recommend: 

1.  A  trial-run  of  compressed  speech  with  younger  students  and 
with  some  who  are    not  as  academically  oriented, 

2,  Development  of  a  system  for  quick  and  inexpensive  compression 
of  textbooks  on  a  tailor-made  plan  hopefully  available  on 
"quota"  through  the  American  Printing  House  for  the  Blind, 


3.   Experimenting  with  the  use  of  compressed  speech  along  with 
variable  speed. 

k.      Development  of  an  individualized  compressor  and  expander 
of  speech  attachable  to  tape  recorders. 

The  next  part  of  the  trial-run  was  concerned  with  variable  frequency 
power  supply  which  involves  a  piece  of  equipment  available  from  the  American 
Foundation  for  the  Blind  through  which  rapid  speech  is  obtained  by  operating 
the  disc  or  tape  reproducer  at  two  times  the  normal  rate  of  speed  and 
adjusting  downward  by  means  of  a  control  to  the  satisfaction  of  the  reader. 

Selection  of  the  participants  in  this  project  was  as  circumstantial 
as  in  the  previous  groups.   However,  because  the  equipment  was  adapted  for 
talking  books  rather  than  tapes,  there  was  an  additional  selection  item 
due  to  the  strong  preference  for  tape  recorded  materials  among  our  potential 
participants.   Although  all  were  enthusiastic  in  varying  degrees,  by  far 
the  most  enthusiastic  was  a  graduate  student,  doctorial  candidate  for  whom 
special  arrangements  were  made  for  adaptation  of  his  tape  recorder  so  that 
either  disc  or  tape  recordings  could  be  used.   Twenty-four  persons  were 
involved  in  this  trial-run. 

Eight  of  these  were  adults  with  an  age  range  of  40-60  and  an  average 
of  k5  .   Two  used  the  equipment  on  graduate  school  readings;  the  others  on 
leisure  time  readings.   All  had  at  least  a  masters  degree.   The  selection 
factor  here  was  also  those  readily  available  and  interested. 
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Four  of  the  participants  were  entering  their  freshman  or  sophomore 
year  in  college  (same  selection  factor  as  above).   Eight  were  high  school 
eleventh  and  twelfth  graders  who  happened  to  be  using  the  American  Printing 
House  for  the  Blind's  talking  book  recorded  textbooks,  Adventures  in  English 
or  Adventures  in  Amer  ican  Li  terature.   Four  others  ranged  from  a  June  high 
school  graduate  down  to  a  sixth  grader  and  a  member  of  a  class  for  slow 
learners.   These  four  used  the  equipment  with  reading  leisure  time  reading 
books  . 

Due  to  time  limitations,  only  selected  comments  from  their  reports 
will  be  read.   None  were  asked  specific  questions  but  merely  to  give  their 
reactions  (this  was  also  true  for  the  compressed  speech  readers).   The  adults 
commented  on  the  need  to  have  the  equipment  smaller  and  lighter  and  adapted 
to  tape  recorders  without  having  to  adapt  the  tape  machine.   Four  of  the 
adults  felt  they  would  prefer  normal  speed  or  slight  increase  for  pleasure 
reading  but  all  were  enthusiastic  for  increased  speed  for  "must"  or  pro- 
fessional read i ng--"the  speed  keeps  them  alert  and  interested". 

Those  entering  their  freshman  and  sophomore  years  in  college  have 
requested  this  device  with  adapted  tape  recorders  for  textbooks  but  felt 
that  they  would  use  only  slight  speed-up  for  pleasure  reading  (when  they 
have  time  for  such).   They  were  very  enthusiastic  claiming  it  invaluable 
for  college  assignment  reading. 

Only  two  adults  and  two  younger  participants  complained  of  the  voice 
d i stort ion--as  one  said,  "Alexander  Scourby's  beautiful  voice  was  made 
horrible".   All  four  commented  on  the  bad  effect  on  women's  voices.   One, 
a  fourteen  year  old  slow  learner,  commented,  "The  faster  speed  kept  me 
more  interested  but  as  I  played  it  fast  the  ends  of  the  words  were  clipped". 
She  used  it  in  reading  Door  i  n  the  Wa 1 1  and  Chr  istopher  Columbus ,  about 
fourth  grade  level  books.   But  she  also  said,  "If  I  used  it  longer,  I 
could  really  get  used  to  it  and  wouldn't  mind  the  word  endings".   Some  of 
her  other  comments  are,  perhaps,  worth  noting.   "I  enjoyed  using  it  very 
much.   It  helped  to  make  my  records  enjoyable.   It  was  very  heavy.   I 
got  my  fingers  caught  in  it  but  I  did  not  mind." 

The  eleven  year  old  sixth  grader,  functioning  on  a  superior 
level,  noted  that  she  could  speed  it  up  to  twice  as  fast  for  parts  of  a  story 
that  wern't  interesting,  that  it  was  easy  to  learn  to  operate  but  that 
in  the  beginning  she  had  a  little  trouble  with  the  motor  "roaring".   She 
begged  to  keep  the  machine  so  that  she  could  read  more  talking  books 
on  it. 

Another  participant  was  the  girl  who  wanted  compressed  speech  at 
a  greater  speed.   (The  only  one  to  take  part  in  both  facets  of  the  program.) 
She  commented  as  follows: 

"I'm  writing  to  tell  you  how  I  1 i  ked  the  device  for  speed ing  up 
the  records  on  the  16  speed.   My  counselor  gave  it  to  me  about 
two  weeks  ago.   One  book  that  I  read  was  on  the  number  6k.      When 
I  tried  another  book,  I  could  understand  it  with  the  machine  at 
a  faster  speed.   For  this  book,  the  machine  was  on  number  68. 

I  like  the  machine  very  much.   I  think  that  you  could  save  time 

and  still  understand  the  books  clearly.   I  especially  like  it 

because  I  could  regulate  the  machine  as  fast  or  as  slow  as  I 
wanted  it." 

From  among  those  using  the  unit  for  textbooks  on  talking  books, 
two  seem  to  express  the  general  reactions.   Kit  wrote: 
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"The  machine  that  speeds  up  the  records  has  advantages  and 
disadvantages.   I  1 i ke  to  listen  to  the  records  speeded  up  when 
I  am  i n  a  hurry  or  want  to  get  the  assignment  done  faster.   When 
the  records  are   speeded  up,  the  voice  is  distorted  and  I  lose 
the  effect  of  the  story  or  poem 

I  set  the  control  bar  at  either  the  second  or  third  dot,  which 
is  either  kO   or  60 .   The  machine  helps  because  I  can  play  it  a 
little  faster  than  at  the  normal  speed  of  the  recording.   If 
I  am  reviewing  the  assignment,  I  can  play  it  much  faster  and 
do  not  have  to  worry  about  details.11 

Another  student's  comment  was  as  follows: 

"In  my  opinion,  the  variable  frequency  power  supply  unit  I  was 
requested  to  test  is  quite  satisfactory.   I  can  more  easily 
concentrate  on  a  book  when  it  is  speeded  up  on  my  talking 
book  machine  by  this  device.   However,  there  is  some  loss  in 
volume  in  the  loud  speaker  but  this  is  not  a  problem,  for  the 
instructions  include  directions  of  how  to  modify  the  wiring 
of  the  talking  book  reproducer  slightly  to  overcome  this.   I 
suggest  that  these  units  be  placed  in  wide  circulation." 

As  a  result  of  our  trial-runs,  with  variable  speed,  we:   (l) 
have  already  requested  the  American  Printing  House  for  the  Blind  to 
investigate  adapting  their  tape  recorders  for  use  with  the  variable 
speed  unit;   (2)   have  begun  ordering  units  for  use  of  college  students 
and  as  our  budget  permits,  we'll  extend  this  downward;   (3)   recommend 
research  toward  a  smaller,  lighter  unit;  (k)       recommend  combining 
variable  speed  into  the  talking  book  again  (remember  the  first  model?); 
would  like  to  participate  in  a  broad  study  of  the  wider  use  of  this  unit 
and  of  the  combination  of  compressed  speed  tapes  and  the  variable  speed 
unit. 

In  a  speech  prepared  for  the  American  Association  of  Workers 
for  the  Blind  l 966  Convention,  Keynote  Address,  the  Honorable  Wilbur  Cohen, 
undersecretary  of  Health,  Education  and  Welfare  said,  regarding  education, 
"We  must  explore  new  teaching  methods  and  devices  that  will  help  the 
blind  to  compete  on  an  equal  footing  with  the  rest  of  us.   We  need  to  print 
more  books  for  the  b I i nd-- i nterest i ng  and  stimulating  books". 

We  have  here  two  devices  to  further  explore  and  two  ways  of  making 
books  more  interesting  and  stimulating.   Let's  meet  his  challenge! 


CHAPTER  VI  I 

Some  Observations  On  Speeded  Speech 
By  Visua I  1 y- Impa i red  Students 
In  The  Atlanta  Public  Schools 


Arthur  Lown' 


AIMS: 

1.  To  determine  the  preferred  medium;  unaltered  speech,  mechanically 
speeded  speech,  or  time-sampled  speech; 

2.  To  obtain  opinions  on  the  effect  of  transmitting  the  preferred 
medium  by  telephone; 

3.  To  observe  individual  differences  in  the  use  of  the  chosen 
med  ium. 

MATERIALS  AND  EQUIPMENT: 

An  experimental  recording  of  time-sampled  speech;  "Selected  Stories 
of  OuHenry"  with  an  increasing  compression  rate  from  165  wpm  to  275  wpm. 

A  Library  of  Congress  talking  book  reproducer. 

A  Wallensak  tape  recorder. 

A  recorded  copy  of  Reader's  Digest  for  May  1 966 . 

A  Powerstat  Variable  Auto  Transformer. 

Two  telephones  on  straight  lines  of  k .J   miles  apart. 

One  speaker  phone. 

PROCEDURES: 

Of  the  1 80  blind  and  partially-seeing  students  enrolled  in  the 
Atlanta  schools,  19  were  chosen  from  grades  7  through  12;   II  braille 
readers  and  8  print  readers.   The  group  listened  to  the  first  and  final 
segments  of  the  O'Henry  record  and  then  to  the  first  segment  with  speed 
increased  by  approximately  one-third  with  the  powerstat.   The  experiment 
was  repeated  with  students  listening  individually  to  the  same  material 
transmitted  over  the  telephone,  first  with  a  conventional  hand  set  and 
then  with  the  speaker  phone.   Finally,  the  students  listened  individually 
to  portions  of  four  selections  from  the  Reader's  Digest  with  each  student 
controlling  the  speed  of  the  record  as  desired  with  the  powerstat. 

RESULTS: 

1.  Student  choices  were  as  follows:   Unaltered  speech--two  students, 
rapid  speech--none  ,  time-sampled  speech  —  seventeen . 

2.  All  members  of  the  group  stated  that  when  listening  to  any  of 
±he  three  types  of  speech  on  the  conventional  telephone,  the  surrounding 
noise  level  should  be  kept  at  a  minimum.   No  such  requirement  was  mentioned 
for  the  speaker  phone. 

3.  The  following  speed  settings  were  noted  when  students  were 
allowed  to  pace  themselves  with  Reader's  Digest  articles. 


"Dr.  Arthur  Lown  is  Coordinator  of  the  Program  for  Visually 
Impaired  at  the  Braille  Library,  Atlanta,  Georgia. 
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Article  Range  Mean 

Word  power  I  14-193  1^7 

Picturesque  speech  145-203  183 

Life  in  these  United  States        183-297  209 

The  last  battle  253-352  321 
LIMITATIONS: 

1.  The  comprehension  factor  was  not  included. 

2.  The  brevity  of  the  observations  failed  to  take  into 
account  "warm  up". 

3.  Since  words  per  minute  were  counted,  and  since  the  dial  of 

the  Powerstat  was  read  tactual ly,  the  rates  given  represent  approximations. 

k.      Some  pauses  were  encountered  in  the  word  power  article. 

5.   It  is  recognized  that  words-per-minute  is  not  nearly  as 
desirable  as  sy 1  lab les-per-mi nute . 

CONCLUSIONS: 

1.  The  preferred  medium  of  the  group  is  time-sampled  speech. 

2.  The  275  wpm  is  a  reasonable  rate  of  compression. 

3.  Telephone  transmission  with  a  low  noise  level  is  satisfactory 
and  with  a  speaker  phone  is  completely  satisfactory. 

k.      When  self-pacing  is  permitted,  it  is  evident  that  an  individual 
rate  changer  is  extremely  desirable  for  listeners. 


CHAPTER  VI  I  I 

An  Experimental  Program  In  Compressed  Speech 
At  The  Tennessee  School  For  The  Blind 


Randa 1 1  Ha r ley" 

A  pilot  study  was  conducted  at  the  Tennessee  School  for  the  Blind 
to  evaluate  the  attitudes  of  blind  subjects  toward  the  use  of  compressed 
speech  in  a  practical  classroom  application. 

Sub  jects 

Five  male  and  ten  female  subjects  from  ages  18  to  25  in  a  senior 
English  class  participated  in  the  program.   The  group  was  composed  of  9 
braille  and  6  print  readers.   One  of  the  subjects  manifested  a  moderate 
hearing  loss,  but  a  hearing  aid  enabled  her  to  participate  in  classroom 
discussion.   The  other  subjects  were  judged  by  the  teacher  to  have  normal 
hearing.   No  subjects  had  previously  heard  compressed  speech. 

Procedure 

The  tapes  of  the  last  part  of  the  text,  a  unit  from  Adventures 
I  n  Eng  1  i  sh ,  had  been  compressed  to  275  words  per  minute  by  use  of  a  Tempo- 
Regulator.   These  tapes  were  furnished  to  the  teacher  for  use  with  the 
subjects  for  a  period  of  one  month.   The  subjects  were  asked  to  evaluate 
the  compressed  speech  at  the  beginning  and  at  the  end  of  the  trial  period. 
During  the  project,  the  teacher  administered  and  scored  a  series  of  ten 
comprehension  tests  that  were  obtained  from  the  text. 

First,  the  tapes  were  played  on  a  Wollensak  tape  recorder  in  the 
classroom.   Later,  homework  assignments  were  made  which  involved  use  of 
the  compressed  speech  materials.   The  tape  recorder  was  centrally  located 
so  that  subjects  could  meet  together  and  listen  to  the  tapes  in  the  evening. 
The  tapes  were  played  at  least  one  time  for  all  subjects.   The  tapes  were 
repeated  as  desired  by  the  subjects.   The  teacher  tested  the  subjects  at 
the  end  of  each  section  of  the  unit  with  the  appropriate  objective  test 
from  the  text.   This  procedure  was  followed  for  a  period  of  one  month. 

Results 

The  comprehension  test  results  indicated  that  the  subjects  tended  to 
score  relatively  low  on  the  initial  test.   The  scores  ranged  from  18  to 
63  on  a  100  point  test  with  a  mean  of  43.   However,  the  percentage  of  correct 
responses  showed  a  marked  increase  on  the  second  test  and  held  relatively 
steady  during  the  remainder  of  the  testing.   The  scores  ranged  from  kO 
to  90  on  a  100  point  test  with  a  mean  of  67.   According  to  the  teacher's 
evaluation,  the  subjects  who  had  been  scoring  high  with  print  and  braille 
materials  tended  to  do  best  with  the  recorded  materials.   The  subject  with 
the  hearing  loss  ranked  last  among  the  group  in  test  scores  of  comprehension. 

Initial  subjective  evaluations  by  subjects  generally  indicated  some 
interest  in  compressed  speech  but  widespread  difficulty  in  understanding 
the  selection  was  noted.   The  initial  selection  may  not  have  been  as  interesting 
as  later  selections  since  it  consisted  of  an  introduction  to  the  unit  rather 
than  actual  prose  and  poetry  which  followed. 

Final  evaluations  by  subjects  generally  indicated  a  much  more  favorable 
response  in  favor  of  compressed  speech  in  comparison  with  the  initial  evaluations 
Favorable  comments  from  the  students  included  comments  which  could  be  grouped 
into  the  following  categories: 


"Dr.  Randall  Harley  is  with  the  Special  Education  Department  of  George 
Peabody  College,  Nashville,  Tennessee. 
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1  .   saving  of  time 

2.  increasing  listening  skills 

3.  making  assignments  more  interesting 

Unfavorable  reactions  could  be  grouped  into  such  categories  as: 

1.  inability  to  comprehend  because  of  speed 

2.  inability  to  comprehend  because  of  distortion 

3.  ineffectiveness  with  poetry 

k.      the  annoyance  of  footnotes  and  line  numbers  inserted  at 
inappropriate  times 

The  first  and  final  comments  of  the  subjects  include  the  following 
remarks:   (ranking  in  average  comprehension  score  is  shown) 

1 .  (rank  of  1 1 ) 

Initial:   It  is  a  quick  way  of  reading  and  maybe  after  a  while 

I  will  get  the  quickness  of  the  reading  in  my  slow  mind.  This 
sure  would  help  me  with  my  book  reading  on  talking  book.   It 
is  a  great  experience. 

F i na  1  :   The  compressed  speech  has  helped  a  lot  in  my  reading  work 
after  school  by  finishing  sooner,  sometimes  you  just  don't  get 
around  to  finishing  all  of  it.   The  compressed  speech  has  helped 
me  a  lot  also  by  listening  attentively  to  the  reading  ...  it 
has  made  me  try  different  methods  of  listening  for  the  important 
parts.   I  think  this  has  made  me  a  little  better  in  my  listening 
ab  i 1 i  ty . 

2.  (rank  of  12) 

Initial  :   I  don't  like  this  topic,  but  it  is  very  interesting 
and  educat  iona I  . 

F  i  na  1  :   I  think  compressed  speech  is  a  waste  of  time,  it  is  too 
fast.   It  is  very  colorful,  unique  and  superb,  but  "I  don't  like 
it". 

3.  (rank  of  13) 

Initial  :   Sometimes  you  can't  understand  all  of  what  he  is 
saying  and  I  think  it  is  fast. 

Final  :   I  think  compressed  speech  is  good  for  stories.   In  poetry 
I  can't  get  the  meaning  because  it  is  too  fast  and  it  gives  the 
number  of  the  1 i  nes  . 

k.       (rank  of  2) 

Initial :   The  understanding  of  the  material  is  rather  difficult. 
Probably  if  we  were  more  accustomed  to  it,  this  would  aid  in  our  work. 

F  i  na 1 :   The  compressed  speech  is  a  great  asset  for  reading  prose. 
More  material  may  be  covered  by  this  means.   For  poetry,  I  would 
prefer  to  read  poetry  by  myself.   I  feel  that  the  value  of  poetry 
must  be  personal  for  each  individual;  therefore  he  must  read  at 
his  own  rate. 
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5.   (rank  of  3) 

Initial  :   I  think  that  after  the  first  six  minutes  it  gets  a 
little  hard  to  follow.   However,  after  I  could  get  used  to  it 
it  would  be  great. 

F  ina 1 :   Compressed  speech  in  the  classroom  has  both  advantages 
and  disadvantages.   It  depends  on  several  factors  whether  or 
not  I  can  get  a  lot  from  it  or  not.   If  I'm  tired  or  headachy, 
it  just  seems  to  go  in  one  ear   and  out  the  other.   I  like  it 
pretty  good  on  prose  but  I  would  still  tend  to  shy  away  from 
it  on  poetry.   I  think  that  it  might  be  of  considerable  help 
to  me  in  preparing  my  lessons  in  college  in  things  like  science 
and  literature.   I  would  like  to  say  here  that  I  think  if  you 
are    going  to  speed  up  the  recording  that  this  calls  for  improving 
the  sound  quality,  too.   It  sounds  kind  of  blary. 

The  results  from  the  evaluations  by  the  students  generally  indicated 
a  contrast  from  a  somewhat  negative  first  impression  to  a  much  more  positive 
final  evaluation.   All  but  one  of  the  students  felt  that  compressed  speech 
would  be  an  asset  in  their  school  work. 

Cone  1  us  ion 

Perhaps  the  most  important  outcome  of  the  experimental  program  at 
the  Tennessee  School  for  the  Blind  was  the  interest  of  the  subjects  and 
the  teacher  in  the  project.   The  teacher  felt  that  the  students  benefited 
by  developing  listening  skills  and  by  having  more  time  to  read  more  materials 
Although  she  felt  that  poetry  was  not  appropriate  for  compression,  she 
liked  the  idea  of  rapid  speech  for  supplementary  reading  in  prose  selections. 
The  new  braille  students  were  also  able  to  keep  up  with  the  class  whereas 
they  were  having  considerable  difficulty  in  braille  reading. 

Although  improvements  and  changes  were  suggested,  the  majority  of 
the  subjects  and  the  teacher  were  most  interested  and  excited  about  the 
development  and  more  extensive  use  of  compressed  speech  in  future  learning 
activities.   Their  expressed  desire  to  participate  in  future  classroom 
experimentation  indicates  their  positive  attitude  toward  this  introductory 
experience  with  compressed  speech. 
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N 

1 

2 

3 

4 

COMPREHENSION  TEST  SCORES 
5    6    7    8    9    10 

M,ean 

Rank 

1 

25 

40 

90 

60 

70 

80 

50 

73 

80 

75 

64.3 

9 

2 

90 

80 

70 

87 

90 

70 

81.2 

1 

3 

63 

100 

70 

80 

88 

50 

80 

80 

80 

70 

76.1 

4 

4 

55 

67 

70 

80 

60 

50 

70 

60 

70 

60 

64.2 

10 

5 

33 

73 

60 

30 

70 

30 

60 

80 

70 

60 

56.6 

1  1 

6 

55 

73 

90 

80 

80 

60 

80 

60 

70 

60 

70.8 

6 

7 

33 

60 

70 

40 

60 

50 

40 

67 

70 

45 

53.5 

12 

8 

55 

87 

90 

90 

70 

90 

70 

73 

85 

95 

80.5 

2 

9 

18 

40 

50 

30 

80 

50 

50 

54 

60 

55 

48.7 

14 

10 

50 

60 

80 

50 

73 

75 

64.6 

7 

1  1 

55 

87 

80 

60 

90 

90 

80 

93 

90 

60 

78.5 

3 

12 

55 

67 

60 

30 

88 

60 

50 

80 

80 

75 

64.5 

8 

13 

4o 

33 

50 

40 

60 

60 

20 

54 

60 

50 

46.7 

15 

14 

52 

80 

90 

80 

80 

50 

90 

73 

80 

80 

75.5 

5 

15 

25 

60 

60 

40 

70 

60 

40 

50.7 

13 

Me  a 


n  43.4  66.7  71.5  56.4  74.4  62.7  60       71.9  75.7  65.8 


CHAPTER  IX 
Listening  and  Reading  in  Learning 

Carson  Y.  Nolan* 


!t  may  strike  some  of  you  as  peculiar  that  today  I  am  giving  the 
only  paper  that  does  not  directly  deal  with  compressed  speech.   At  the 
American  Printing  House  we  have  long  been  interested  in  compressed 
speech  as  a  solution  to  some  of  the  communication   problems  facing  the 
visually  handicapped,   A  concrete  expression  of  our  interest  can  be 
found  in  the  report  of  the  first  large  scale  study  on  the  visually 
handicapped  which  we  carried  out  jointly  with  the  University  of  Louisville. 
Less  concrete  evidence  exists  in  our  several  years  of  collaboration 
with  the  American  Institute  for  Research  during  which  we  have  prepared 
many  of  the  materials  they  have  required, 

The  several  papers  being  presented  here  today  deal  primarily 
with  the  technical  and  theoretical  aspects  of  the  generation  of  compressed 
speech  and  its  use.   I  believe  that  the  information  accumulated  to  date 
and  presented  in  this  meeting  justifies  the  conclusion  that  application 
of  compressed  speech  materials  can  contribute  greatly  to  the  education 
of  the  visually  handicapped  and  to  others.   However,  study  of  the  history 
of  education  shows  that  development  of  new  and  productive  educational 
devices  and  methods  in  no  way  implies  that  these  will  ever  find  their 
way  into  application  in  the  classroom  or  the  study  behavior  of  students. 
It  is  with  this  aspect  of  the  problem  that  we  at  the  American  Printing 
House  as  manufacturers  and  distributors  are   most  concerned. 

Traditionally,  books  in  braille  and  large  type  form  have  been 
the  primary  formal  verbal  medium  for  education  of  the  visually  handicapped. 
This  tradition  is  no  doubt  based  on  traditions  arising  in  the  education 
of  the  sighted  for  whom,  during  the  last  five  centuries,  books  have 
proven  to  be  a  permanent,  economical  and  efficient  vehicle  for  education. 
For  many  years  we  have  known  that  printed  materials  are  not  particularly 
efficient  communication   media  for  the  visually  handicapped,   Over  the 
last  25  years  evidence  has  accumulated  demonstrating  that  for  some 
curricular  areas  recordings  provide  a  more  efficient  medium  for  communication 
than  do  printed  materials    Yet,  in  1965  fewer  than  2%  of  over  19,000 
legally  blind  students  enrolled  in  grades  1-12  of  residential  and  public 
schools  were  reported  as  using  recorded  textbook  materials  as  primary 
means  of  reading.   Therefore,  to  us  it  appeared  that  if  educators  have 
been  reluctant  to  capitalize  on  the  advantages  stemming  from  use  of 
auditory  materials  recorded  at  normal  word  rates,  it  would  be  optimistic 
to  expect  them  to  jump  on  the  compressed  speech  bandwagon 

With  these  things  in  mind,  it  seemed  to  us  we  could  most  help  the 
educator  of  visually  handicapped  children  by  supplying  him  with  concrete 
evidence  of  the  relative  usefulness  of  auditory  and  written  materials. 
In  the  event  that  this  evidence  supported  use  of  auditory  materials 
at.  normal  word  rates,  we  could  best  help  by  exploring  and  developing 
techniques  for  their  use  in  individual  study  and  in  the  classroom.   For 
the  last  three  years,  we  have  been  engaged  in  such  a  program  of  research 
and  development.   I  would  like  to  describe  our  progress  to  you  this 
afternoon . 

Our  first  effort  was  an  exploratory  one  in  which  our  purpose  was 
to  see  if  variables  such  as  age,  intelligence,  academic  achievement, 


"Dr.  Carson  Y   Nolan  is  Director  of  the  Department  of  Educational 
Research  at  the  American  Printing  House  for  the  Blind,  Louisville,  Kentucky. 
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personal  adjustment,  etc,  were  related  to  the  listening  ability  of  the 
visually  handicapped  in  the  same  way  as  has  been  found  for  the  sighted. 
The  results  of  this  study  indicated  that  they  were,   The  study  which  was 
conducted  in  a  southern  residential  setting  consisted  of  dual  schools  for 
white  and  negro  students,  provided  an  interesting  sidelight.   Using  the 
STEP  Listening  Tests  as  a  criterion,  negro  students,  on  the  average,  exhibited 
a  consistent  inferiority  to  white  that  ranged  up  to  45  percentile  points  for 
one  grade.   This  finding  does  not  present  much  encouragement  to  those  who 
might  wish  to  use  commonly  available  auditory  materials  in  attempts  to  overcome 
educational  deficits  for  such  groups. 

In  the  next  phase  of  our  program  we  compared  retention  of  materials 
representing  the  curricular  areas  of  literature,  social  studies,  and  science 
when  listening  and  reading  were  the  means  of  access.   Participating  in  these 
studies  were  a  total  of  1152  legally  blind  students  from  both  public  and 
residential  school  programs.   Half  this  group  were  braille  readers  and  half 
were  large  type  readers.   These  groups  in  turn  were  divided  equally  between 
the  elementary  level  (grades  4-6)  and  the  high  school  level  (grades  9-12). 
Using  a  complex  factorial  design,  the  three  studies  compared  amounts  retained 
when  students  read  or  listened  to  materials  once  or  read  or  listened  on  each 
of  three  consecutive  days. 

In  general  the  findings  were  the  same  for  large  type  as  for  braille 
readers.   In  the  elementary  grades  listening  and  reading  resulted  in  retention 
of  equal  amounts  of  material  in  every  curricular  area    studied.   In  the  high 
school  grades,  no  differences  in  retention  for  science  materials  were  found 
between  listening  and  reading  groups.   However,  for  both  literature  and 
social  studies  amounts  of  retention  for  reading  groups  was  slightly  but 
significantly  greater. 

In  all  cases,  it  took  much  longer  for  the  groups  to  read  the  materials 
than  to  listen  to  them.   This  was  less  true  for  large  type  readers  than  for 
braille.   In  order  to  estimate  the  relative  efficiency  of  each  mode  of  presentation, 
the  retention  score  of  each  subject  was  divided  by  the  amount  of  time  required 
to  read  or  listen  as  appropriate.   The  resulting  ratio  represented  the  amount 
learned  per  minute  of  exposure  time,   When  these  ratios  were  compared  for  the 
groups,  learning  by  listening  proved  cons i stently  most  efficient.   At  the  high 
school  level,  listening  proved  to  be  from  1 83%  to  248%  more  efficient  for 
braille  readers  and  from  155%  to  207%  for  large  type  readers.   At  the  elementary 
level,  similar  ranges  were  284%  -  360%  for  braille  readers  and  190%  -  250%  for 
large  type  readers.   It  should  be  obvious  that  utilization  of  speech  compression 
would  result  in  even  greater  efficiency  for  listening.   Further  research  on  this 
is  pi anned  . 

Paralleling  these  studies  has  been  activity  directed  toward  identification 
of  techniques  for  efficient  individual  study.   For  this  purpose  we  have  used 
two  approaches:   interviews  of  experienced  users  of  recorded  text  materials 
and  task  analysis  of  certain  aspects  of  the  job  of  studying  when  using  auditory 
materials.   This  information  will  enable  us  to  write  the  specifications  for 
a  system  for  study  using  recorded  materials.   The  system  will  consist  of  a  work 
station  including  a  specially  designed  player  to  provide  the  informational 
input  and  a  station  for  appropriate  student  response.   The  system  development 
will  include  the  description  of  techniques  for  study  within  the  work  station. 
From  this  point  we  hope  to  move  to  research  which  will  enable  us  to  develop 
techniques  for  classroom  use  of  recorded  text  materials.   Included  will  be 
attempts  to  develop  techniques  for  teaching  listening  in  the  primary  and 
elementary  grades  as  well  as  attempts  to  develop  techniques  to  teach  curriculum 
contents  within  the  classroom  context. 


CHAPTER  X 

Recent  Research  in  the  Training  of 
Compressed  Speech  Comprehension 


Herbert  L.  Friedman" 
David  B.  Orr* 


Abstract 


This  paper  contains  a  summary  of  research  conducted  at  the  AIR 
into  some  variables  associated  with  the  comprehension  of  time-compressed 
speech  by  college  students.   Among  the  factors  examined  are  the  duration, 
continuity,  and  rate  of  practice  listening  in  improving  comprehension, 
the  use  of  listening  aids,  self-pacing,  reading  correlates,  retention 
of  skill  and  content  of  high  speed  speech,  and  the  subjective  responses 
of  1 i  steners . 

The  findings  reported  suggest  that  comprehension  of  compressed 
speech  may  be  improved  with  practice,  that  the  limits  of  comprehension 
have  not  yet  been  reached,  that  it  is  a  feasible  technique  for  the 
educational  setting,  and  that  further  research  is  desirable  to  examine 
individual  differences  and  to  isolate  specific  factors  in  good  listening. 
An  overwhelming  number  of  experimental  subjects  have  had  a  favorable 
attitude  toward  compressed  speech.   Other  areas  of  potential  research, 
especially  with  regard  to  the  use  of  time-compressed  speech  as  a  research 
tool  are   ment  ioned . 


"Dr.  Herbert  L.  Friedman  and  Dr.  David  B.  Orr  are   associated  with  the 
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The  chief  emphasis  of  the  experiments  that  we  at  the  American  Institutes 
for  Research  have  been  conducting  concerning  the  variables  associated  with  time 
compressed  speech  has  lain  with  the  possibility  that  the  temporal  limits  of 
speech  processing  may  be  stretched  with  the  right  kind  of  exposure  to  compressed 
speech.   This  research  has  been  sponsored  by  the  U.  S.  Office  of  Education.   For 
some  time  we,  along  with  many  others  (as  witnessed  by  their  presence  at  this 
Conference),  have  believed  that  the  normal  rate  of  speech  usually  does  not  tax 
the  processing  ability  of  the  ordinary  listener.   There  are   exceptions  to  this: 
on  the  stimulus  side  when  the  speech  is  very  densely  packed  with  information, 
presented  in  acoustical  noise,  highly  ambiguous,  badly  constructed,  etc.;  on 
the  listener  side,  when  the  listener  is  hard  of  hearing,  mentally  retarded, 
more  at  home  with  another  language,  or  preoccupied  with  something  else.   But, 
for  the  ordinary  listener  in  the  ordinary  situation,  the  rate  at  which  he 
receives  speech  is  not  too  fast;  in  fact,  subjectively  speaking,  it  is  often 
too  s low. 

The  fact  that  normal  speech  is  not  too  fast  should  not  be  surprising. 
While  "rate  of  speech"  can  be  a  very  ambiguous  term,  the  normal  college  lecture 
rate  is  often  quoted  as  being  about  125  words  per  minute.   Many  people  can  read 
quite  comfortably  at  twice  and  sometimes  three  times  that  rate.   It  would  also 
be  generally  agreed  that  people  can  think  faster  than  normal  speech.   These 
facts  suggest  that  the  processing  problem  at  least  up  to  twice  normal  speed 
is  not  cognitive.   However,  it  must  be  remembered  that  we  are    not  accustomed 
to  hearing  fast  speech.   In  fact,  we  hardly  ever  hear  speech  that  exceeds  200 
words  per  minute. 

Our  listening  habits  begin  much  earlier  than  our  reading  habits,  and 
most  people  listen  more  than  they  read  during  their  lives.   As  someone  recently 
pointed  out  to  me,  this  is  not  true  only  during  our  lives,  but  throughout  the 
history  of  man.   Until  the  advent  of  the  printing  press  he  was  never  forced  to 
process  language  at  a  rate  faster  than  normal  speech.   While  the  evidence  is 
overwhelming  that  man  can  handle  language  input  at  a  rate  greater  than  he  can 
produce  speech,  the  opportunity  to  do  so  is  quite  limited.   We  should  not,  perhaps, 
be  surprised  if  comprehension  of  high  speed  speech  by  an  untrained  listener  is 
less  than  perfect. 

The  rationale  for  examing  possibilities  of  training  high  speed  comprehension 
is  double-barreled.   On  one  hand,  we  believe  that  a_  priori  ev  idence  ex  i  s  ts  to 
suggest  it  can  be  done;  on  the  other  hand  it  can  truly  be  said  that  the  need 
for  it  has  never  been  so  great.   The  information  explosion  is  prodigious.   I 
don't  suppose  anyone  has  stopped  to  count,  but  it  seems  quite  likely  that  the 
volume  of  communication  increases  each  year.   Quite  apart  from  the  population 
increase  is  the  worldwide  increase  in  television  braodcasting  and  in  the  printed 
matter  needed  to  train  and  inform  people  about  the  new  technology  both  here  and 
abroad.   There  is  a  growing  number  of  people  who  can  read  and  who  are    educated 
enough  to  comprehend  and  want  more  complex  communications.   The  amount  of  material 
taught  in  the  schools  and  colleges  is  rising  astronomically  as  is  the  number  of 
people  attending  colleges,  and  so  on.   But,  there  is  one  group  in  the  population 
which,  ever  since  the  invention  of  the  printing  press,  has  suffered  in  greater 
proportion  than  the  rest  of  the  populace,  and  who  today,  in  an  age  of  technological 

brilliance,  is  ironically  more  handicapped  than  ever  before and  that  is  the 

blind.   While  Braille  production  was  a  tremendous  breakthrough,  the  rate  of 
Braille  reading  is  no  longer  enough  to  compensate  (if  it  ever  was).   Records 
and  magnetic  tape  also  provide  vital  help  as  the  sponsors  of  this  Conference 
can  testify.   But,  even  they  when  played  at  normal  speed  are    not  enough  to 
diminish  the  information  gap  between  the  sighted  and  the  blind. 
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We  are  here  to  discuss  the  potentialities  of  a  technological  break- 
through in  its  infancy  which  may  go  a  long  way  toward  making  up  this  gap-- 
t ime-compressed  speech.   Since  it  is  now  feasible  to  time-compress  speech 
without  distorting  pitch,  the  chief  obstacle  to  good  intelligibility  is 
reduced.   No  doubt,  further  development  will  improve  this  process,  but  for 
the  meantime  it  is  vital  that  we  use  what  we  have. 

Outline  of  Experimentation 

While  I  have  used  the  term  "training"  in  describing  the  objectives 
of  our  research,  as  Sam  Duker  pointed  out  at  the  APA  Symposium  on  Compressed 
Speech  last  year,  we  are  really  talking  about  practice  or  exposure  rather 
than  training.   We  should  have  liked  to  train  the  comprehension  of  compressed 
speech,  but  the  training  of  listening  is  something  which  is  as  yet  primarily 
unexp lored . 

For  the  most  part,  our  experimentation  has  consisted  of  different 
types  of  exposure  to  speeded  speech  as  the  independent  variable  and  the 
measurement  of  listening  comprehension  as  the  dependent  variable.   Multiple 
choice  tests  consisting  of  25  to  30  items,  standardized  on  a  similar  college 
population,  were  used  for  the  comprehension  measure. 

Throughout  our  research  we  used  male  college  students  primarily, 
at  the  freshman  and  sophomore  level,  with  no  severe  hearing  loss  and  no 
marked  regional  accents.   We  have  habitually  asked  for  biographical  data  and 
given  a  debriefing  questionnaire  at  the  end  of  the  experiments.   Typically, 
our  subjects  are    paid  $1.50  an  hour  plus  carfare  and  have  the  chance  to  win 
bonuses  based  on  superior  performances.   Our  experiments  have  been  conducted 
in  a  sound  deadened  laboratory  with  tapes  played  free  field  either  from  a 
standard  tape  recorder  or  the  Tempo  Regulator.   We  have  used  connected 
discourse  throughout  and  tried  to  use  material  that  is  relevant  to  the 
college  age  population.   The  conditions  in  our  laboratory  are,  to  some 
extent,  an  attempt  to  approximate  classroom  conditions,  as  our  research 
has  been  primarily  directed  to  the  feasibility  of  using  compressed  speech 
i  n  such  a  s  i  tuat  ion . 

In  the  first  series  of  experiments  we  exposed  students  to  ten  to 
fourteen  hours  of  listening  practice  over  four  weeks  and  periodically 
examined  their  listening  proficiency.   For  practice  material  in  this  series 
we  used  popular  novels  which  were  easily  understood  and  relatively  interesting  to 
a  college  population.   As  test  material  we  used  a  college  level  textbook 
on  English  history  during  the  period  of  American  colonization.   We  had  had 
prior  evidence  from  our  own  pilot  studies  and  other  research  as  well  as 
simple  observation  to  suggest  that   up  to  about  275-300  words  per  minute, 
there  was  little  or  no  loss  of  comprehension.   We,  therefore,  began  this 
experiment  with  practice  at  325  wpm,  increased  it  to  375  wpm  the  next  week, 
stayed  with  375  the  third  week  because  we  were  not  satisfied  with  the  progress, 
and  went  to  425  the  fourth  week.   In  addition  to  tests  at  these  speeds,  we 
also  tested  our  subjects'  initial  performance  at  normal  recording  speed 
(175  wpm)  and  at  very  high  speed  (475  wpm).   We  repeated  the  475  wpm  test 
at  the  end  of  the  experiment  to  measure  change. 

This  general  pattern  was  used  several  times.   In  the  next  experiment 
we  introduced  three  minute  breaks  every  ten  minutes  with  the  intention 
of  improving  performance.   We  also  ran  a  group  of  female  college  students 
in  this  study.   Finally  in  this  series,  we  ran  an  experiment  identical 
to  the  first  one  except  that  all  practice  listening  was  at  425  wpm,  although 
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the  tests  were  administered  at  the  same  graduated  speeds  as  in  the  previous 
experiments.   In  addition  to  the  above  groups,  we  ran  a  control  group  which 
received  no  practice  1 i s ten i ng -but  received  the  same  test  material  and 
tests.   Each  of  these  groups  contained  16  to  18  subjects. 

The  results  of  this  series  were  fairly  encouraging.   Performance 
on  the  final  new  passage  at  425  wpm  was  significantly  better  for  all 
experimental  groups  than  for  the  control  group.   Performance  on  the 
repeated  475  wpm  passage  at  the  end  of  the  experiment  was  significantly 
better  for  three  of  the  four  groups,  and  better  but  not  significantly  so 
for  the  fourth  group.   The  mean  scores  for  all  experimental  groups  combined 
at  the  4-25  wpm  rate  was  about  66%  of  their  normal  speed  scores  compared 
with  50%  for  the  control  group.   The  differences  between  male  and  female 
subjects  were  insignificant;  and  the  overall  differences  between  experimental 
groups  were  not  statistically  significant,  although  the  mean  scores  for  the 
graduated  practice  and  high  speed  practice  were  both  considerably  better 
than  for  the  interrupted  practice  groups. 

In  addition  to  the  above  findings,  all  of  the  above  groups  were 
given  alternate  forms  of  the  Nelson-Denny  Reading  Test,  which  measures 
reading  rate,  vocabulary,  and  reading  comprehension,  before  and  after 
experimentation  with  compressed  speech.   But,  there  have  been  no  consistent 
major  changes  in  reading  performance  over  all  experiments.   The  same  is 
true  for  the  measurement  of  listening  at  normal  speed  before  and  after 
the  experiment.   We  have  yet  to  find  a  significant  change  though  we  are 
not  yet  completely  convinced  that  improvement  in  compressed  speech  listening 
will  have  no  effect  on  normal  speech  listening. 

Following  this  series  of  experiments  which  determined  that  ten 
to  fifteen  hours  of  practice  spread  over  four  weeks  was  effective  in 
improving  performance  relative  to  a  control  group,  it  was  hypothesized 
that  intensive  practice  over  a  shorter  period  of  time  might  be  even  more 
effective.   It  was  determined  to  expose  subjects  to  eight  hours  of  listening 
a  day,  for  five  days,  to  material  presented  at  high  speed,  about  4-25  wpm. 
The  same  type  of  material  (popular  novels)  was  used  for  practice,  and  the 
historical  passage  was  used  for  tests.   Unlike  the  previous  experiments, 
all  tests  were  presented  at  the  single  speed  of  425  wpm. 

The  results  of  the  "immersion  study"  as  we  call  it  were  satisfactory, 
with  the  mean  scores  of  subjects  showing  a  fairly  steady  progression  upwards 
from  40.4%  of  normal  score  on  the  first  day  to  70.0%  on  the  fifth  day,  with 
a  setback  on  the  second  day.   However,  although  for  a  rate  of  425  wpm  or  about 
three  times  that  of  normal  speed,  70%  comprehension  on  the  first  hearing  of 
a  new  passage  is  fairly  good,  the  results  were  no  better  than  that  of  the 
original  graduated  practice  group  which  achieved  almost  80%  comprehension 
or  the  high  speed  practice  group  which  achieved  about  70%  comprehension 
at  425  wpm,  in  spite  of  the  fact  that  the  immersion  group  had  almost  three 
times  as  much  exposure  to  compressed  speech  as  other  experimental  groups. 
This  group  of  subjects  in  common  with  the  others,  showed  a  significant 
improvement  in  performance  on  a  very  high  speed  (475  wpm)  passage  presented 
before  and  after  practice  listening,  with  a  mean  change  in  score  from  20 
to  52%  of  normal.   Once  again,  however,  there  was  no  significant  change  in 
reading  scores  as  measured  by  the  Nelson-Denny  Reading  Test. 

In  the  next  experiment  undertaken,  designated  the  criterion  study, 
the  main  objective  was  to  determine  the  amount  of  practice  listening  necessary 
to  reach  the  criterion  of  90%  of  normal  speed  comprehension  at  a  rate  of 
375  wpm.   In  previous  experiments,  while  significant  improvement  had  occurred 
at  higher  speeds  (425  and  475  wpm),  these  mean  scores  were  well  below  90% 
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of  normal  scores.   In  this  experiment  a  new  measure  was  introduced  in 
the  hope  of  reducing  dependence  on  the  somewhat  cumbersome  multiple 
choice  test  which  could  feasibly  be  presented  only  at  the  end  of  each 
passage.   At  periodic  intervals,  the  tape  was  stopped  and  the  listeners 
requested  to  identify  the  last  word  heard.   This  measure  of  intelligibility 
was  to  be  compared  with  comprehension  scores  in  the  hope  of  finding  a 
high  positive  correlation. 

In  this  study,  five  benchmark  passages  were  presented  at  375  wpm 
after  two,  seven,  nine  and  one-half,  and  sixteen  hours  of  practice. 
Following  each  test,  the  passage  was  re-administered  with  interruptions 
for  the  measurement  of  "last  word  intelligibility".   Practice  listening 
consisted  of  three  of  the  most  popular  of  the  novels  used  in  the  immersion 
study  (all  of  which  had  been  used  in  previous  experiments).   The  usual 
normal  speed,  benchmark  passage  was  presented  as  were  alternate  forms  of 
the  Nelson-Denny  before  and  after  experimental  practice. 

The  results  of  this  experiment  were  somewhat  disappointing. 
Although  seven  of  the  ten  subjects  achieved  90%  or  better  at  some  point 
during  the  experiment,  there  was  a  great  deal  of  individual  variation 
and  no  apparent  upward  progression  unlike  all  other  experiments  we  have 
run.   The  mean  percentage  on  the  second  administration  of  each  test  was 
98%,  not  significantly  different  than  normal.   However,  it  should  be 
remembered  that  not  only  had  they  heard  the  passages  once  but  the  test 
had  also  been  taken  once.   It  is  possible  that,  although  the  usual 
payments  and  bonuses  (based  on  the  first  test)  were  made  to  subjects, 
motivation  to  listen  closely  to  the  test  material  was  reduced  because 
of  the  expectancy  of  a  second  try.   The  intelligibility  measure  did  not 
prove  to  be  a  satisfactory  correlate  of  the  comprehension  scores,  but 
it  is  probably  because  intelligibility  as  we  measured  it  was  very  high, 
even  when  comprehension  scores  were  low. 

This  experiment  was  continued  on  a  pilot  study  basis  with  five 
of  the  best  subjects  in  an  effort  to  go  on  to  criterion  at  425  wpm  by 
presenting  material  which  was  similar  in  nature  to  the  test  material. 
The  results  of  that  attempt,  however,  do  not  merit  reporting  here. 

In  addition  to  the  studies  examining  the  trainability  of  compressed 
speech  comprehension,  several  small  studies  have  been  run  to  examine  the 
ability  of  our  subjects  to  retain  information  which  has  been  heard  at 
high  speed  and  their  ability  to  retain  the  skill  in  listening  to  compressed 
speech  that  they  have  acquired.   Because  of  lack  of  time,  I  will  not  go 
into  details  of  these  three  experiments,  except  to  report  that,  within 
rather  narrow  limits  of  these  experiments,  we  have  not  found  significant 
retention  of  the  speeded  speech  listening  skill  developed  during  practice; 
but,  we  have  found  that  material  learned  under  compressed  conditions  is 
retained  at  least  as  well  as  material  heard  at  normal  speed. 

During  the  course  of  our  experimentation,  one  hypothesis  had  been 
suggested  for  which  we  did  not  have  hard  and  fast  data  but  rather  accumulated 
subjective  evidence  and  tentative  conclusion  from  a  number  of  correlations 
we  had  run.   It  seemed  to  us  that  the  better  listeners  to  compressed  speech 
listened  to  it  much  the  way  they  listened  to  normal  speech,  i.e.,  they 
listened  to  phrases  and  sentences  rather  than  individual  words.   Up  to 
this  point,  subjects  were  given  practice  in  speed  listening  but  were  not 
really  prepared  for  the  content  of  what  was  to  come.   On  the  assumption 
that  at  very  high  speeds  there  is  a  shortage  of  time  in  which  to  process 
the  input,  preparation  for  content  was  intended  to  reduce  the  alternatives 
in  what  was  to  come,  thereby  hastening  identification  of  material. 

With  this  end  in  mind,  summaries  of  the  passages  which  carefully 
avoided  giving  away  answeres  to  the  test  questions  and  lists  of  key  words 
in  the  passages  were  prepared  and  administered  to  two  groups  of  subjects 
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immediately  prior  to  hearing  the  compressed  passage.   A  third  group 
acted  as  a  control  and  received  no  preparation.   Twenty-two  subjects 
completed  the  experiment. 

Once  again  our  standardized  test  passages  were  used  to  measure 
initial  normal  speed  performance  and  were  used  as  benchmark  tests  throughout 
the  experiment    One  hour  of  practice  listening  was  provided  each  day 
for  seven  consecutive  weekdays,  followed  by  a  passage  and  test.   All 
material  was  presented  at  375  wpm.   A  final  passage  at  this  speed  was 
presented  to  all  subjects  without  listening  aids. 

The  results  of  this  experiment  were  surprising  to  us.   Neither  the 
precis  nor  the  key  word  lists  made  any  significant  difference  to  subject 
performance  relative  to  the  control  group.   However,  all  three  groups 
showed  a  fairly  steady  improvement  in  performance  with  a  mean  performance 
of  about  93%  of  normal  score  on  the  last  passage  presented  to  all  subjects 
without  listening  aids.   For  the  last  passage  presented  with  listening  aids, 
the  precis  group  achieved  92%  of  normal,  the  key  word  group  98%,  and  the 
control  group  88%. 

To  test  the  contribution  of  the  summary  to  performance  on  the 
test,  one  final  test  was  given  immediately  after  exposure  to  a  summary 
without  presentation  of  a  passage.   One  matched  half  of  each  of  the 
three  groups  received  the  summary  while  the  other  half  acted  as  the 
control.   There  was  no  significant  difference  in  performance  between  the  two 
comb  ined  ha  1 ves  . 

Two  questions  arose  out  of  this  data,   Why  weren't  the  listening 
aids  more  effective,  and  why  were  overall  results  at  375  wpm  better 
than  previous  experimental  results?   In  answer  to  the  former  question, 
we  felt  that  in  preparing  the  summaries  to  avoid  giving  away  answers 
on  the  test  we  may  have  misled  the  listener.   To  remedy  this  we  are    planning 
another  experiment  to  re-examine  this  question.   Some  of  our  subjective 
data  suggested  that  the  key  word  list  may  have  had  a  somewhat  detrimental 
effect  to  counter  any  aid  it  provided  in  that  some  subjects  were  listening 
for  the  key  words  rather  than  overall  meaning. 

The  achievement  of  90%  performance  of  normal  scores  may  have 
resulted  from  a  combination  of  the  best  features  of  previous  experiments. 
The  study  was  run  on  a  daily  basis,  with  an  hour  and  a  half  participation. 
Practice  material  was  presented  for  about  fifty  minutes  without  interruption, 
and  the  proportion  of  practice  material  to  test  material  was  fairly  low. 

In  addition  to  the  listening  data,  in  this  experiment  we  also 
attempted  to  find  correlates  of  listening  performance  by  administering  a 
battery  of  information  and  aptitude  tests  to  the  subjects  at  the  end  of 
the  experiment    These  results  were  analyzed  in  terms  of  their  correlations 
with  normal  spaed,  initial  high  speed,  and  final  high  speed  scores,  and 
also  by  looking  at  the  good,  medium,  and  bad  subjects  separately. 

Time  does  not  permit  detailed  description  here,  but  it  may  be 
said  in  general  that,  while  the  listening  correlations  were  low,  language 
handling  ability  seemed  to  be  most  relevant  to  good  listening  both  at 
normal  speed  and  at  final  high  speed,  while  specific  memory  skills  were 
inversely  related  to  listening  scores  particularly  at  the  beginning  of 
high  speed  listening,  suggesting  that  excessive  attention  to  detail  is 
probably  antithetical  to  good  listening  in  these  conditions. 

The  last  major  study  we  have  done  is  one  which  examines  the 
self-pacing  characteristics  of  listeners.   In  this  experiment,  after  an 
initial  measure  of  performance  at  normal  speed,  subjects  were  given 
half  their  material  at  approximately  one  and  one-half  times  normal  speed 
and  left  to  find  their  own  level  of  speed  during  the  remaining  half.   In 
this  experiment,  no  practice  listening  was  provided.   All  material  consisted 
of  the  standardized  test  passages. 
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The  results  of  this  experiment  have  not  yet  been  fully  analyzed. 
However,  we  may  say  a  few  things  about  it.   The  first  high  speed  passage 
was  externally  paced  at  1.5  normal  speed,  and  the  twelve  subjects  there- 
after seemed  to  take  that  as  a  guide  and  stay    in  that  range,  although 
they  were  encouraged  to  raise  the  rate  as  high  as  they  could  and  stil) 
comprehend.   Performance  on  the  tests  was  highly  varied  with  a  mean 
for  both  externally  paced  and  self  paced  at  about  82%  of  normal.   Speed 
again  seemed  the  most  crucial  factor,  rather  than  self  versus  externally 
pacing.   An  upward  trend  was  again  apparent  with  the  last  test  in  each 
category  at  90  or  S\%   of  normal. 

One  final  word  about  our  results.   In  each  of  our  experiments 
debriefing  questionnaires  were  administered.   This  had  provided  us  with 
useful  subjective  data.   The  vast  majority  of  our  subjects  have  felt  that 
practice  listening  is  useful,  that  more  practice  would  be  desirable, 
and  that  there  is  a  place  for  compressed  speech  in  their  college  curricula. 
While  we  have  attempted  wherever  possible  to  objectify  our  data,  we 
feel  the  relatively  high  degree  of  acceptance  of  compressed  speech  among 
our  college  subjects  is  a  finding  of  vital  importance  to  the  potential 
application  of  compressed  speech  in  the  college  setting. 

We  have  plans  for  further  research  in  the  near  future  which  include 
a  comparison  of  the  suitability  of  other  types  of  material  for  compression, 
psychological  and  geological;  the  usefulness  of  compressed  speech  as  a 
review  technique;  an  examination  of  reading  and  listening  comprehension 
by  speed;  and  a  further  examination  of  the  efficacy  of  listening  aids. 

But  one  of  the  most  pressing  needs  for  further  research  in  this 
area    is  the  determination  of  the  characteristics  of  good  listening  not 
only  for  compressed  speech  but  for  oral  communication  in  general    Some 
of  the  research  we  hope  to  do  will  examine  listening  in  an  attempt  to 
analyze  and  isolate  specific  characteristics  that  are    important  in  the 
processing  of  speech.   To  achieve  this,  we  intend  to  use  time-compressed 
speech  as  a  research  tool--a  role  for  which  we  feel  it  is  uniquely  suited. 

We  also  feel  that  the  weight  of  the  evidence  of  our  research  and 
that  of  others  points  to  the  feasibility  of  using  compressed  speech  in 
the  educational  setting  and  that  the  time  is  ripe  to  take  it  into  the 
field  and  put  it  to  the  test. 
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CHAPTER  XI 

The  Eltro  Information  Rate  Changer  Mark  11: 
Simple  Quality  Speech  Compression 

Hugh  S,  Al  I  en,  Jr.* 

Thank  you,  and  good  afternoon,  ladies  and  gentlemen..   It  is  a 
privilege  to  be  invited  to  address  you  today.   As  suppliers  of  hardware 
to  produce  compressed  speech,  we  are    particularly  grateful  for  this 
opportunity  to  demonstrate  and  explain  the  Eltro  Information  Rate  Changer 
Ma  r  k  11. 

To  give  tangible  meaning  to  the  title  of  my  talk,  I  shall  now 
continue  by  means  of  tape  recording, 

TAPE   CONTENT 

The  speed  of  the  instrument  now  is  at  the  nominal  15  i  ps  at  which 
the  tape  recording  was  made.   As  I  proceed,  1  will  compress  certain  sections 
in  varying  degrees  and  expand  other  sections  as  well. 

Since  my  allotted  time  is  twenty  minutes,,  it  would  be  possible  to 
deliver  in  this  period  nearly  forty  minutes  of  material  at  the  full  compression 
rate  of  the  machine.   Although  I  shall  not  use  the  full  compression  rate,  the 
tangible  advantages  of  speech  compression  will  be  obvious. 

In  outline,  I  shall  cover  the  history  and  development  of  the  Eltro 
Information  Rate  Changer  and  the  technical  principles  behind  its  operation. 

In  the  early  1950's,  the  late  Anton  Springer  invented  and  patented  what 
he  called  an  acoustical  pitch  and  tempo  regulator.   This  device  was  produced  by 
the  German  telephone  equipment  manufacturing  firm  of  Telephonbau  und  Normalzeit 
in  Frankfurt,  for  whom  Mr.  Springer  was  a  research  scientist.   Worldwide  patents 
were  obtained  covering  numerous  aspects  of  his  invention,  and  many  U,  S.  patents 
have  been  granted.   The  patent  numbers  are    included  in  the  printed  copy  of  this 
talk  for  reference  purposes.   In  I960,  Anton  Springer  left  TeleNorm  and  joined 
the  firm  of  Eltro  &  Company  GmbH  in  Heidelberg.   This  firm  is  associated  with 
the  Hughes  Tool  and  Machine  Company  in  the  United  States.   In  1962,  Eltro 
continued  the  manufacture  of  the  tempo  regulator  model  MLR  38/15,  in  the  same 
general  form  as  the  previous  TeleNorm  instrument  but  with  certain  improvements. 
Between  the  old  TN  machines  and  the  Eltro  model,  perhaps  some  thirty  of  these 
units  have  found  their  way  into  research  laboratories,  universities,  and  activities 
for  the  blind  in  this  country.   These  devices  remain  laboratory  instruments, 
requiring  considerable  skill  to  operate  as  well  as  patience  in  accommodating 
to  certain  operational  shortcomings, 

The  unfortunate  and  premature  demise  of  Mr.  Springer  in  August  of  196^ 
occasioned  a  complete  review  of  the  patents  which  became  the  property  of  Eltro 
Automation,  and  set  in  motion  a  research  and  development  program,  the  first 
result  of  which  is  the  Mark  11  device  being  used  today. 

Before  proceeding  further,  let  me  explain  what  this  machine  does. 
In  a  few  words,  it  permits  a  1/4"  magnetic  tape,  recorded  at  15  inches 
per  second,  to  be  played  back  at  nearly  double  rate  --  actually,  in  as  little 
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as  53%  of  the  original  time,  or  conversely,  at  one  half  the  recorded 
velocity  --  for  an  expansion  to  200%  of  the  original  time,  and  any  desired 
setting  between  these  extremes.   Most  importantly,  this  takes  place  without 
even  the  most  minute  change  of  pitch  which  would  disguise  the  original  sound 
This  function  is  accomplished  by  turning  only  one  knob  to  the  desired 
percentage  of  compression  or  expansion,  as  you  are  now  witnessing  and 
hea  ring. 

How  is  this  done  so  simply?   The  answer  is  that  the  device  has  been 
made  simple  in  operation.   The  underlying  theory  and  the  mechanics  of 
execution  are    another  matter,   !  shall  attempt  the  explanation  with  an 
apology  to  those  of  you  who  are    not  technically  oriented  and  also  to 
those  for  whom  the  explanation  will  not  be  sufficiently  technical. 

The  pitch  or  frequency  of  a  recorded  sound  has  a  relationship 
to  the  wavelength  of  a  tape  recorded  sound  through  the  common  factor,  the 
velocity  of  tape  travel    V  =  Xf  -   Velocity  equals  the  product  of  wave- 
length and  frequency.   For  a  defined  velocity  of  tape  travel,  the  smaller 
the  wavelength  the  higher  the  frequency  or  pitch  of  the  sound,   The 
wavelength  of  the  recorded  sound  is  determined  by  the  permanent  arrangement 
of  the  magnetic  particles  of  the  tape.   It  follows  that  if  the  rate  of 
tape  travel  is  varied,  the  frequency  of  a  recorded  wavelength  will  vary 
in  direct  proportion.   At  double  tape  speed,  frequencies  are   doubled, 
or  in  other  words,  pitch  has  been  increased  one  octave,   Only  when  a 
tape  recording  passes  over  the  plavback  head  gap  at  the  same  velocity 
at  which  it  was  recorded  are    the  wavelengths  translated  into  the  correct 
f requenc  ies 

Many  devices  have  been  constructed  which  will  play  tapes  or  disks 
at  a  velocity  different  from  that  at  which  they  were  recorded.   Everyone 
has  experienced  the  "Donald  Duck"  effect  and  its  opposite.   The  train- 
ability  of  the  comprehension  of  speeded  speech  using  such  techniques 
cannot  be  denied    However,  there  are    serious  shortcomings  involved 
in  the  personality  alteration  of  the  speaker  which  reduce  the  efficacy 
of  such  techniques   To  continue;   Obviously  another  way  to  change  the 
propagation  velocity  would  be  to  move  the  playback  head  instead  of 
the  tape,  since  it  is  always  relative  velocity  which  is  significant 
A  practical  way  to  accomplish  this  would  be  to  mount  a  number  of  playback 
heads  on  a  wheel  and  bend  or  wrap  the  tape  in  such  a  way  that  as  one  gap 
of  a  playback  head  begins  to  lose  contact  with  the  tape,  another  will  just 
begin  to  make  contact  so  as  to  have  a  continuous  perceived  output  of 
sound.   This  is  a  rudimentary  approach,  and  Springer's  patents  cover  the 
construction  of  many  different  and  sophisticated  types  of  rotating  heads. 
However,  rotating  heads  as  such  are  not  particularly  unusual,  and  have 
been  employed  for  many  years  in  a  number  of  different  applications. 
Certain  criteria  relevant  to  the  construction  of  the  rotating  head 
will  be  introduced  later. 

If  the  tape  were  to  stand  still  and  the  head  to  rotate,  only  a 
small  section  of  the  tape  would  be  played,  but  when  the  linear  velocity 
of  the  surface  of  the  head  and  the  gaps  thereon  became  equal  to  the 
linear  velocity  at  which  the  tape  was  recorded,  this  small  section  would 
be  reproduced  in  correct  pitch    This  example  may  seem  academic,  but 
it  does  exemplify  one  of  the  fundamentals  involved. 

To  go  further,  we  can  also  transport  the  tape  and  rotate  the  head 
at  the  same  time   This  is  a  practical  development  --  a  pitch  changer. 
If  a  standard  tape  machine  were  equipped  with  a  rotating  head  instead 
of  a  fixed  one,  the  effective  linear  velocity  of  tape  to  head  gap  could 
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be  varied  at  will    If,  for  example,  the  recorded  tape  velocity  were  15 
ips,  and  the  tape  were  being  transported  through  the  tape  machine  by 
the  driving  capstan  at  this  velocity,  the  pitch  of  the  reproduced  material 
would  be  correct  only  if  the  rotatable  head  were  stationary.   If  the  head 
were  rotated  in  the  same  direction  as  tape  travel,  the  relative  head 
velocity  would  be  the  difference  between  the  two  velocities  and  less  than 
the  recorded  velocity.   A  limit  is  reached  when  the  linear  velocity  of 
the  head  gaps  would  equal  the  velocity  of  tape  travel.   The  result  is 
that  pitch  is  steadily  lowered  to  pitch  zero,  the  limit  where  there 
would  be  no  sound  output  at  all,  similar  to  a  tape  standing  still  over 
a  f  ixed  head  . 

Rotating  the  head  in  the  opposite  direction  to  tape  travel  would 
cause  an  increase  in  relative  velocity  and  increase  in  pitch.   If  the 
linear  velocity  of  the  rotating  head  were  to  equal  the  tape  transport 
velocity,  the  relative  tape  to  playback  gap  velocity  would  be  doubled 
and  the  pitch  raised  one  octave 

We  have  seen  what  occurs  when  a  tape  is  transported  over  a 
fixed  head  at  velocities  different  from  that  at  which  it  was  recorded. 
We  have  seen  the  result  of  rotating  a  head  over  a  stationary  tape, 
and  finally  we  have  seen  the  effect  of  rotating  the  playback  head 
against  a  tape  being  transported  at  its  recorded  velocity.   Now  for 
the  next  step:   We  could  record  a  tape  at  a  certain  velocity  --  say 
7-1/2  ips  --  and  play  it  back  at  15  ips,  accepting  the  consequent  one 
octave  pitch  increase.   Then  we  could  rotate  the  p  layback'  head  in  the 
direction  of  tape  travel  at  a  speed  which  would  reduce  the  effective 
tape  to  gap  velocity  to  7-1/2  ips,  and  the  pitch  would  be  restored 
to  normal;  but  what  has  happened?   The  subject  matter  is  being 
retrieved  at  twice  the  rate  at  which  it  was  recorded  or  a  50%  compression 
of  content  has  taken  place. 

There  is  one  more  step  which  can  be  taken,  and  it  is  this  significant 
step  which  was  Anton  Springer's  most  important  contribution.   Instead  of 
forcing  the  compress ion--or  expansion,  for  that  matter-- in  two  steps, 
Springer  integrated  the  two  functions  of  variable  tape  travel  velocity 
and  velocity  of  the  rotating  head,  and  differentially  locked  these  two 
functions  in  such  a  way,  that  as  the  transport  speed  of  the  tape  is 
varied  from  the  nominal  recorded  velocity  of  the  tape,  the  head 
automatically  rotates  in  the  proper  direction  to  maintain  correct  pitch. 

Why  pitch  has  not  changed  should  already  be  understood.   To 
understand  why  compression  has  taken  place,  it  is  necessary  to  return 
to  the  example  of  the  rotating  head  standing  still,  reproducing  a 
tape  being  transported  at  its  recorded  velocity.   Again,  15  ips  is 
taken  as  nomi  na 1 . 

It  was  mentioned  that  to  have  a  continuous  perceived  sound 
output,  one  gap  should  just  be  gaining  contact  as  the  one  preceding  is 
losing  contact..   This  describes  the  static  condition  which  becomes 
less  and  less  true  with  greater  degrees  of  compression  and  expansion, 
as  we  shall  see.   Further  "perceived"  is  the  key  word  to  describe  the 
physiological  characteristic  of  hearing  under  conditions  of  compression 
or  expans  ion , 

The  reason  the  Eltro  device  functions  at  all  is  because  of  a 
limitation  of  the  hearing  function,  called  the  Haas  effect.   Haas 
discovered  that  the  ear    cannot  perceive  sounds  shorter  than  35 
milliseconds  as  distinctly  separate  sounds.   It  is  this  deficiency 
of  the  ear   which  permits  the  Eltro  instrument  to  perform  as  it  does. 
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Now  is  the  appropriate  time  to  discuss  the  construction  of  the 
rotating  playback  head.   Among  the  controlling  parameters  are  the  following 

1)  Accurate  quadrature  relationship  of  the  gaps. 

2)  All  gaps  held  accurately  at  right  angles  to  direction  of  tape 
trave  1  . 

3)  Output  of  all  four  gaps  must  be  within  1  dB  at  all  frequencies 
to  15,000  cycles  . 

4)  Output  of  head  must  be  high  for  extra  advantage  over  head 
contact  noise. 

5)  At  the  relative  linear  velocity  used,  the  distance  between 
head  gaps  must  be  scanned  in  less  than  35  m  sec. 

Point  5  above  is  most  significant  and  germane  to  the  understanding 
of  how  compression  and  expansion  occur.   The  Eltro  rotating  head  has  a 
gap  spacing  of  13-35  mm.   At  15  i  ps ,  the  time  to  traverse  gaps  is  34.68 
m  sec.,  just  under  the  Haas  effect  limit. 

Springer,  in  describing  the  operation  of  his  invention,  viewed 
the  tape  as  though  it  were  a  continuous  series  of  13-35  mm  segments.   Such 
a  viewpoint  helps  to  simplify  the  explanation  of  what  actually  takes 
place  in  compressing  and  expanding  material  with  the  Eltro  Information 
Rate  Changer 

Figure  1  shows  a  number  of  13-35  mm  sections   Tape  velocity  is 
the  nominal  15  ips.   Playback  is  at  normal  rate. 


Fig.  1 
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Figure  2  shows  the  result  of  reproduction  with  transport  velocity 
of  16.5  i p s . - 
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Since  the  tape  transport  velocity  is  some  10%  greater  than  the  15 
ips  relative  playback  velocity,  every  10th  13  35  mm  segment  of  the  tape 
catches  up,  is  not  scanned  by  a  gap,  and  is  therefore  discarded.   The 
segments  immediately  preceding  and  following  the  discard  are   connected 
together  since  they  were  read  by  consecutive  head  gaps.   Hence  10% 
compression  was  gained  at  the  expense  of  a  lost  35.68  m  sec.  of  material, 
a  sound  too  short  to  be  perceived  as  having  identity  of  its  own. 

Figure  3  shows  the  effect  of  reproduction  with  transport  velocity 
of  18  ips*  for  a  compression  of  20%.   Every  5th  segment  is  discarded. 


Fig.  3: 


2  3  4  6  7  8  9   1 1  12  13  14  16  17  18  19  21  22  23  23 


Figure  4  shows  the  effect  of  reproduction  with  a  transport  velocity 
of  30  ips  for  50%  compression.   Every  other  segment  is  discarded. 


Fig.  4: 
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The  converse,  speech  time  expansion,  can  be  similarly  explained.   In 
this  function,  the  tape  velocity  is  slower  than  the  relative  playback  velocity. 
The  head  gaps  catch  up  with  tape  and  repeat  a  percentage  of  13-35  mm  segments. 

Figure  5  shows  maximum  expansion  with  the  Eltro  device.   Tape  transport 
velocity  is  7-1/2  ips.   Every  13-35  mm  element  is  repeated  for  200%  expansion. 


Fig.  5:   1   12233445566778899   10   10 

As  you  listen  to  this  maximum  rate  of  expansion,  you  are    hearing  the 
super impos i t ion  of  an  interfering  signal.   This  is  the  rate  at  which  segments 
are    being  repeated.   Although  each  is  too  short  to  have  individual  character, 
the  repetition  rate  itself  is  a  new  sound.   This  is  an  unavoidable  consequence 
of  the  rotational  velocity  of  the  head  in  expansion,   You  will  notice  that 
it  decreases  as  expansion  is  reduced.   Further,  you  will  have  noticed  that 
this  sound  is  not  present  with  compression  as  the  rate  of  discarded  material 
is  not  noticeable  as  a  new  sound. 

In  the  interests  of  clarity  I  have  taken  certain  mathematical 
liberties.   In  the  two  examples  of  10%  and  20%  compression,  the  tape  tran- 
sport velocities  are  not  exactly  16.5  ips  and  18  ips  respectively.   Figure 
6  shows  the  relationship  of  tape  transport  velocity  to  degree  of  compression 
or  expansion.   The  relationship  XY  =  1  deviates  slightly  from  a  straight 
line  for  small  degrees  of  compression  or  expansion.   The  error  would  be 
appreciable  with  higher  percentages  of  compression  or  expansion.   The 
Eltro  Mark  11  Information  Rate  Changer  is  the  first  generation  of  devices 
coming  from  Springer's  invention  to  permit  direct  settings  in  percentage 
of  original  time,  thus  automatically  performing  the  mathematical  conversion. 

You  have  observed  that  I  have  been  turning  only  one  knob  which  has 
caused  my  voice  to  be  heard  at  different  delivery  rates.   In  operation, 
this  is  just  how  simple  it  is  to  achieve  compression  or  expansion  of 
recorded  material  with  this  device. 

Technically,  turning  this  control  knob  causes  a  number  of  operations 
to  take  place  within  the  instrument.  Figure  7  will  help  to  explain  the 
function.  Actually,  the  knob  controls  a  linkage  between  two  AC  motors. 
The  linkage  serves  to  control  the  positions  of  two  rubber  tired  wheels, 
each  of  which  rides  on  the  surface  of  one  of  the  two  disks.  These  disks 
are  fixed  to  the  rotors  of  the  two  motors.  Motor  §  1  has  a  fixed  speed 
of  900  rpm  and  serves  only  the  function  of  prime  mover  to  drive  motor 

#  2  through  the  linkage.   The  position  of  the  wheels  on  the  radii  of 
the  disks  determines  the  speed  at  which  motor  #2    is  driven  by  the  motor 

#  1.   At  the  100%  setting  of  the  control  knob,  both  motors  are  turning 
at  900  rpm.   The  rotor  of  motor  #  2  drives  the  capstan  which  transports 
the  tape  through  the  Eltro  device   900  rpm  angular  velocity  is  equivalent 
to  15  ips  linear  velocity,  the  nominal  velocity  at  which  original  tapes 
must  be  recorded  to  be  converted  by  the  Information  Rate  Changer. 

Motor  #   2    is  a  synchronous  type  of  an  unusual  configuration.   It 
is  of  four  pole  pair  construction  and  its  synchronous  speed  is  900  rpm. 
This  is  significant,  as  the  unusual  feature  of  this  motor  is  that  the 
field  assembly  is  free  to  rotate,  60  cycle  AC  power  being  fed  to  the 
field  coils  through  brushes  and  slip  rings.   At  900  rpm,  the  motor's 
synchronous  velocity,  all  is  quite  normal;  that  is,  the  field  assembly 
is  stationary  as  it  would  be  in  an  ordinary  synchronous  motor.   However, 
if  the  linkage  is  varied  by  turning  the  control  knob,  the  wheels  move 
to  other  locations  on  the  disk  radii,  and  motor  #  2  can  be  driven  at 
speeds  other  than  its  synchronous  speed  of  900  rpm  --  actually  from  a 
lower  limit  of  450  rpm  to  an  upper  limit  of  nearly  1800  rpm.   As  previously 
mentioned,  since  the  rotor  of  motor  #   2   drives  the  capstan,  it  provides 
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a  variation  in  linear  velocity  from  7-'/2  ips  to  nearly  30  ips.   To  maintain 
the  synchronous  field  relationship  between  rotor  and  the  field,  the  field 
assembly  must  rotate,  in  a  direction  necessary  to  maintain  a  constant  900 
rpm  difference  between  the  rotor  and  the  field 

The  field  assembly  is  coupled  mechanically  to  drive  the  rotating 
head,   The  constant  relative  angular  velocity  of  900  rpm  between  the 
rotor--which  is  the  capstan,  and  the  field  —  which  is  the  head,  is  equivalent 
to  15  ips  linear  velocity.   The  end  effect  is  that  the  velocity  of  the 
capstan  and  that  of  the  rotating  head  are  held  in  exact  synchronism  by 
an  e lect r  i  ca  1  field  lock  and  pitch  change  is  impossible,  regardless  of 
the  capstan  velocity 

The  audio  frequency  signal  from  the  rotating  head  assembly  is  picked 
off  by  silver  alloy  brushes  in  contact  with  four  points  on  two  coin  silver 
slip  rings  on  the  bottom  of  the  head  and  fed  to   a  transistor  amplifier 
which  processes  the  signal  to  a  usable  level. 

In  the  printed  copy  of  this  talk  ere  listed  a  dozen  areas  of  tech- 
nical and  operational  improvement  of  the  Eltro  Mark  11  Information  Rate 
Changer  over  predecessor  models    Research  and  development  continue 
toward  the  solution  of  certain  problems  which  will  extend  the  usefulness 
of  future  models.   For  example,  an  extremely  difficult  problem  is  the 
development  of  a  head  assembly  providing  ~]-\/2    ips  relative  velocity. 
A  simpler  development  project  is  devoted  to  the  production  of  an  assembly 
which  will  provide  the  tape  handling  function  which  must  now  be  performed 
by  an  associated  tape  machine   Research  also  continues  along  basic  lines 
with  the  goal  of  extending  the  compression/expansion  range  beyond  its 
approximate  1  to  4  ratio 

You  have  listened  patiently  to  this  discourse  concerning  a 
practical  manner  to  accomplish  speech  time  compression  and  expansion. 
Still  this  is  only  hardware  which  produces  an  end  result  which  must 
be  put  to  effective  use  by  specialists  in  communication.   Compressed 
speech  is  so  in  its  infancy  that  its  true  meaning  and  use  is  surely  yet 
to  be  revealed    It  is  even  poss'ble  that  we  should  view  compressed 
speech  as  a  new  art  form. 

Until  recently  we  have  held  the  opinion  that  no  new  soft  ware 
would  be  needed  for  this  technology    Many  shared  with  us  this  opinion 
that  any  or  all  recorded  material  could  be  compressed    We  are    no  longer 
so  certain,  particularly  if  the  full  potential  of  this  technology  is  to 
be  realized.   Several  aspects  are   worthy  of  investigation.   Should  not 
speakers  be  trained  to  project  in  a  certain  way  if  the  final  result  is 
to  be  in  compressed  form?   Should  net  techniques  to  increase  redundancy 
be  used? 

The  rate  of  accenting  should  be  investigated    Should  the  accent 
be  placed  on  the  paragraph  rather  than  the  sentence?   Can  more  comprehension 
result  if  accenting  does  not  proceed  at  a  higher  rate  than  that  to  which 
we  are    accustomed  through  direct  speech? 

These  are   all  determinations  which  must  come  from  research  conducted 
by  you  and  others  in  your  various  fields.   We  would  like  to  cooperate  with 
such  research  programs  in  every  way  we  can. 

Than  k  you . 

Reference  patent  numbers, 

3,047,673 

3,077,587 

2,977,423 

3,064,088 

2,996,583 

3 , 04 1 ,000 
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THE  ELTRO  INFORMATION  RATE  CHANGER 
MARK  1 1 
TECHNICAL  AND  OPERATIONAL  IMPROVEMENTS  OVER  PREVIOUS  MODELS 

1)  Rotating  Head  Construction:   A  head  assembly  of  totally  new  design 
and  construction  is  incorporated  in  the  Mark  11  unit.   Quadrature  and 
azimuth  tolerances  are  held  to  closer  limits.   Output  level  has  been 
increased.   Gap  width  has  been  reduced  to  0.0002"  extending  frequency 
response  flat  to  15  kc .   The  head  assembly  is  replaceable  with  a  7-1/2 
ips  assembly  when  available.   Head  does  not  accumulate  oxide  rapidly 

as  with  previous  units. 

2)  Head  Contact  Assembly:   A  new  type  of  long  life  brush  assembly  of 
silver  alloy  contacts  at  two  points  on  each  coin  silver  ring.   Noise 
has  been  reduced  considerably  and  the  need  for  lubrication  has  been 
obv  iated  , 

3)  Capstan  Top  Support  and  Sleeve:   The  capstan  is  fitted  with  a  removable 
sleeve  preparing  the  unit  for  7-1/2  ips  use  when  rotating  head  is  avail- 
able.  The  capstan  is  top  supported  for  low  whip  at  7-1/2  ips. 

4)  Remote  Control  Operation:   Provision  is  made  to  start  and  stop  the 
associated  tape  machine  when  the  function  control  is  moved  to  tempo 
pos  i  t  ion . 

5)  New  Type  Capstan  Roller:   Moving  function  control  automatically 
brings  precision  pressure  roller  into  contact  with  capstan.   Function 
control  in  pitch  position  allows  rotation  of  head  only  for  pitch 
change. 

6)  Integrated  Electronics:    The  Mark  11  is  equipped  with  an  American 
made  transistorized  NAB  equalized  amplifier  bringing  head  output  to 

line  level.   Output  is  balanced  and  floating  from  a  Cannon  XLR  connector. 
Gain  and  tone  controls  are    provided.   Previous  problems  of  matching 
heads  to  electronics  are   absent. 

7)  Calibration  of  Control  Knob:   Calibration  of  the  control  knob  of  the 
Mark  11  is  in  percentage  of  original  time,  not  in  percentage  of  increase 
or  decrease  of  tape  travel  velocity.   Mathematical  conversion  is  no 
longer  necessary. 

8)  Linkage  Improvement:   The  gear  train  formerly  used  to  drive  the 
rotating  head  from  the  field  assembly  has  been  dropped  in  favor  of  a 
rubber  idler  and  flywheel.   By  depressing  the  control  knob  the  idler  can 
be  relieved  and  quick  changes  of  compression  rates  can  be  accomplished. 

9)  New  Tape  Gu  ides  :   Two  new  tape  guides  have  been  added  to  the  Mark 
11  in  close  proximity  to  the  playback  head  for  improved  flutter  figure; 
left  guide  height  adjustment  for  azimuth,  right  guide  eccentric  adjust- 
ment to  change  wrap  angle. 

10)  Easier  Height  Adjustment:   All  idlers  and  tape  guides  easily  adjusted 
with  Allen  wrench  supplied. 
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11)   A  Plastic  Dust  Cover  is  furnished  which  can  be  installed  during 
operat  ion . 


12)   New  Design  Top  Plate  is  functional  and  attractive.   Cabinet  in 
wal nut. 


CHAPTER  XI  I 

Time  Factor  Alteration  of  Recorded  Sound  While 
Maintaining  Frequency  Constants. 

Wayne  W.  Graham"" 

Our  participation  in  this  meeting  is  based  on  the  work  we  are  doing 
to  provide  hardware  for  speech  compression. 

In  preparing  for  today's  presentation  we  assumed  you  would  be  more 
interested  in  what  the  equipment  will  do  rather  than  how  it  does  it.  Therefore, 
we  have  not  planned  to  give  a  technical  resume  of  our  work  but  have  brought 
recordings  to  demonstrate  our  product. 

We  are  most  pleased  to  have  this  opportunity  to  become  acquainted 
with  you  and  hope  we  can  in  turn  establish  a  continuing  relationship. 

Our  machine  is  based  on  the  original  design  developed  at  the 
University  of  Illinois.   Design  refinements  have  been  made  and  as  the  work 
continues  we  foresee  simplifications  in  prospect  which  will  probably  reduce 
the  cost  of  the  unit. 

During  the  last  three  years,  as  we  searched  for  design  improvements, 
we  have  on  several  occasions  become  aware  of  potentially  important  changes 
in  component  design.   We  have  been  forced  to  postpone  testing  these  because 
of  insufficient  funds  for  the  purpose.   All  that  can  be  said  at  the  moment 
in  this  connection  is  that  we  believe  we  can  have  a  future  model  with  a 
distortion  factor  so  low  that  the  product  will  be  free  of  tell  tale  evidence 
of  equipment  deficiencies.   The  range  of  such  a  machine  will  be  from  zero 
change  to  the  maximum  useable. 

Let  us  now  play  a  few  samples  of  compressed  speech  after  which  we 
will  be  glad  to  answer  your  questions  if  we  can. 


"Mr.  Wayne  W.  Graham  is  with  Discerned  Sound,  Hollywood,  California. 


CHAPTER  XIII 
Problems  of  Measuring  Speech  Rate 
John  B .  Carrol  1* 

Speech  rate  is  clearly  a  critical  variable  in  speech  compression,  both 
in  describing  the  input  to  any  speech  compression  system  and  in  characterizing 
the  output.   Reliable  and  meaningful  measures  of  speech  rate  are    "musts"  if  we 
hope  to  appraise  what  a  speech  compression  system  actually  does.   It  cannot  be 
assumed  that  just  because  a  system  has  a  certain  compression  ratio,  such  as  75%,  it 
will  always  produce  speech  at  a  certain  rate,  because  obviously  the  output  rate 
will  depend  upon  input  rate  as  well  as  the  compression  ratio.   This  fact  seems  to 
have  been  forgotten  by  some  of  the  more  ardent  exponents  of  speech  compression 
when  they  report  output  rates  of,  say,  500  wpm  by  merely  using  the  compression 
ratio  in  their  calculation,  without  making  careful  measurements  of  the  input 
rates . 

As  I  will  show,  measuring  speech  rate  is  not  a  simple  matter  of  counting 
words  per  minute,  for  this  measure  can  give  misleading  results. 

Measurements  of  speech  rate  enter  our  considerations  at  another  point.   A 
tacit  assumption  in  most  speech  compression  work  is  that  this  process  will  have 
an  output  that  is  in  some  way  unusual,  i.e.  not  normally  producible  by  the  unaided 
speaker.   This  raises  the  question  of  what  is  in  fact  normal,  usual,  or  typical, 
and  also  the  question  of  where  the  dividing  line  is  between  what  is  normal,  usual, 
and  typical  and  what  is  abnormal,  unusual,  or  atypical.   To  answer  this  question  we 
need  not  only  reliable  normative  data  on  normal  speech  rates  but  also  psychophysical 
studies  of  subjective  responses  to  speech  at  various  rates  (Hutton,  195M-   Although 
the  literature  contains  some  leads  on  these  matters,  it  is  not  adequate  for  our 
purposes . 

Further,  we  need  to  know  something  about  the  ability  of  speakers  to  control 
their  speech  rates  at  designated  levels  or  to  vary  them  on  demand,  and  about  what 
changes  in  their  speech  take  place  at  different  rates.   This  is  undoubtedly  an 
important  matter  in  connection  with  the  production  of  speech  recordings  for  the  blind. 
Likewise  we  need  to  know  much  more  about  the  ability  of  listeners  to  comprehend 
materials  at  various  rates;  in  research  on  this  matter  the  different  procedures  in 
the  measurement  of  speech  rate  must  be  taken  carefully  into  account  in  interpreting 
the  resu 1 ts . 

In  this  talk,  I  will  consider  various  problems  of  measuring  speech  rate  as 
they  pertain  to  what  is  ordinarily  called  ora 1  read  ing  rate ,  that  is,  the  rate  at 
which  a  speaker  reads  aloud  a  continuous  prose  text  which  he  has  not  necessarily 
seen  before.   This  is  in  contradistinction  to  spontaneous  speech  rate,  i.e.  the 
rate  of  speaking  when  the  speaker  is  continuously  composing  speech  of  a  "novel" 
character,   I  shall  not  deal  with  spontaneous  speech  rate;  which  entails  some  very 
difficult  problems  of  measurement,  although  some  of  the  same  considerations  apply 
to  it  as  in  the  case  of  oral  reading  rate. 

Let  us  suppose  that  our  problem  is  to  determine  the  distribution  of  the 
oral  reading  rates  of  a  large  sample  of  educated  adult  speakers  of  English;  if  the 
problem  is  stated  in  this  way  it  entails  practically  all  the  problems  I  want  to 
discuss  and  excludes  such  problems  as  the  sampling  of  speakers,  with  which  I  am 
not  concerned,   I  should  note,  however,  that  in  reporting  norms  for  reading  rates, 
the  sampling  of  the  speakers  must  be  adequately  specified  in  terms  of  age,  sex,  education, 


"'•'Professor  John  B.  Carroll  is  associated  with  the  Department  of  Psychology 
at  Harvard  University,  Cambridge,  Massachusetts. 
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intelligence,  background,  and  any  abnormal  characteristics  such  as  speech 
handicaps,  deafness,  etc. 

First,  we  must  know  what  we  mean  by  rate  ,  In  the  present  context, 
rate  must  be  reported  according  to  the  following  model: 

x  units  of  speech  output  per  un  i  t  of  time. 

For  example,  we  might  report  words  per  minute,  syllables  per  second,  or 
even  pages  per  hour,  if  the  pages  are   of  some  specified  standard  size  and 
wordage.   We  shall  speak  of  the  problem  of  units  of  measurement  in  a 
moment.   Right  now,  I  call  your  attention  to  the  fact  that  the  model  given  as 

x  units  of  speech  output  per  unit  of  time 

is  not  the  same  as 

y  units  of  time  per  unit  of  speech  output. 

This  second  model  is  exemplified  by  such  measurements  as  minutes  per  300 
words ,  seconds  per  sy 1  lab  1 es  ,  or  m  i  nutes  per  page .   To  be  sure,  these  two 
models  are    related  to  one  another;  by  doing  the  appropriate  arithmetic, 
one  can  be  obtained  from  the  other.   However,  reporting  a  rate  in  terms 
of  the  second  model  does  not  give  measurements  that  can  be  as  readily 
apprehended  and  handled  statistically  as  by  the  first  model.   300  syllables 
per  minute  is  a  more  readily  grasped  measurement  than  the  equivalent 
statements  ".33  minutes  per  100  syllables"  or  "0,2  seconds  per  syllable". 
Furthermore,  taking  arithmetic  averages  of  measurements  reported  in  output 
per  unit  of  time  is  legitimate,  while  taking  arithmetic  averages  of  measure- 
ments reported  in  time  per  unit  of  output  is  not.   In  fact,  the  two  averaging 
procedures  give  different  results;  if  the  second  kind  of  measurement  is  to 
be  averaged  at  all,  the  harmonic  mean  rather  than  the  arithmetic  mean  must 
be  used  . 

All  this  is  important  because  a  surprising  number  of  investigations 
have  used  improper  procedures.   In  the  literature.,  the  most  common  procedure 
for  measuring  speech  rate  is  to  clock  a  speaker  while  he  reads  a  standard 
paragraph.   Thus,  Johnson  (1961)  clocked  the  times  (in  seconds)  to  read  a 
300-word  passage.   There  is  nothing  essentially  wrong  in  this,  but  when  he 
proceeded  to  average  these  times,  using  the  arithmetic  mean,  the  result 
was  misleading  since  it  was  not  the  same  as  the  average  he  would  have  gotten 
by  converting  the  times  to  the  form  words  per  minute.   Furthermore,  generally 
speaking,  measurements  taken  as  units  of  output  per  unit  of  time  are    normally 
distributed  over  persons  or  over  occasions,  while  measurements   taken  as  amount 
of  time  per  unit  of  performance  are    skewed  positively. 

If  very  small  time  units  are   used,  data  must  be  reported  to  sufficient 
precision  to  permit  projections  to  larger  units.   For  example.,  a  measurement 
reported  as  "3  syllables  per  second"  is  probably  imprecise,  unless  it  is 
actually  "3.000  syllables  per  second",  from  which  one  could  reliably  translate 
to  "180. 0  syllables  per  minute". 

The  real  bugaboo  in  this  picture  is  the  question  of  units.   Units 
of  time,  at  least,  can  be  specified  without  difficulty  and  can  be  measured 
to  almost  any  desired  degree  of  precision  if  sufficient  precautions  are 
taken,  although  even  here,  errors  can  be  made.   I  remember  an  occasion 
when  one  of  my  assistants  timed  something  with  a  stopwatch  graduated  in 
lOOths  of  a  minute,  under  the  impression  that  it  was  graduated  in  the 
normal  seconds  and  fifths  of  a  second.   Naturally  the  results  were  disastrous. 
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But  when  we  consider  units  of  speech  output,  there  are    some  real  problems. 
It  is  apparently  a  standard  convention  to  report  speech  rates  in  terms  of 
words  per  minute,  and  thus  far-  in  this  talk  1  have  spoken  about  words 
per  minute  just  in  order  to  seem  conventional,   From  the  standpoint  of 
scientific  measurement,  however,  the  word  is  a  very  poor  unit  and  I  wish 
that  we  could  abolish  wpm  measurements.   It  is  not  a  standard  unit,  for 
words  vary  in  length  from  the  simple  a    to  the  classic  antidisestabli  shment- 
ar  ian  i  sm.   Furthermore,  different  samples  of  prose  vary  considerably  in 
the  average  length  of  their  words,  whether  measured  in  terms  of  syllables 
or  in  terms  of  phonemes    Number  of  syllables  per  word  figures  as  an 
element  in  various  measures  of  readability. 

Let  me  report  in  this  connection  an  experiment  that  Jeffrey  Sampson 
did  under  my  supervision  at  Michigan  recently.   Six  passages  from  American 
fiction  were  selected,  each  containing  between  300  and  320  words  and  from 
10  to  15  sentences.   In  most  respects,  the  passages  were  homogeneous  both 
within  and  among  themselves;  the  one  respect  in  which  they  were  deliberately 
allowed  to  vary  was  the  syllable-to-word  ratio  (S/W) „   These  ranged  from 
1.20,  for  a  passage  from  Ernest  Hemingway,  to  1.73  for  a  passage  from 
Henry  James.   Twenty-four  speakers  were  asked  to  read  these  passages 
aloud  "at  a  normal,  comfortable  rate"  and  they  were  timed.   The  order  in 
which  the  passages  were  read  was  systematically  varied  according  to  a 
Latin  square;  the  Ss  were  not  aware  that  the  passages  varied  in  S/W  ratio. 
The  times  for  individual  subjects  were  converted  into  words,  syllables 
or  phonemes  per  minute  and  then  averaged.   Table  1  and  Figure  1  show  the 
results,   It  is  evident  that  words/min  is  systematically  related  to  S/W; 
sy 1  lab les/min  much  less  so,  and  phonemes/min  still  less  so,   Furthermore, 
the  coefficient  of  variation  of  the  means  over  passages  was  greatest 
in  the  case  of  words  and  least  in  the  case  of  phonemes;  in  other  words, 
phonemes/min  gave  the  most  consistent  results.   Syllables  do,  of  course, 
vary  in  length,  but  average  length  of  syllable  probably  does  not  vary 
from  text  to  text  as  much  as  word  length.   (For  the  six  passages,  phonemes/ 
syllable  ranged  from  2.71  to  2.5^  and  tended  to  be  inversely  correlated 
with  S/W  ratio;  evidently  the  syllables  of  longer  words  tend  to  be  shorter 
than  the  syllables  of  short  words).   Nevertheless,  since  phonemes  are 
difficult  to  count,  for  practical  purposes  I  recommend  the  sy 1  lab  le  as 
the  unit  of  speech  output  in  measuring  speech  rate. 

But  because  the  average  speech  rate  in  terms  of  even  syllable/ 
minute  varies  somewhat  with  the  nature  of  the  text,  it  is  probably  wise 
to  collect  norms  on  a  rather  wide  variety  of  texts,  before  we  can  accept 
really  representative  values. 

There  is  an  interesting  sidelight  from  this  study;  namely  that 
oral  reading  rates  differed  very  consistently  over  the  24  speakers. 
Correlations  of  rates  among  the  six  passages  ranged  from  .91  to  .98.   This 
means  that  each  speaker  maintained  a  very  consistent  rate  over  the  six 
passages,  and  these  rates  varied  widely 

A  number  of  investigators  (e.g.  Starkweather,  I960;  Shearn  et  a  1 . , 
1961)  have  used  apparatus-produced  units;  i.e.,  they  report  the  rate  at 
which  the  intensity  of  the  speech  signal  passes  specified  levels  determined 
by  the  settings  of  electronic  gating  circuits.   Two  things  should  be 
recognized  about  these  measurements;  one,  they  may  be  affected  by  the  nature 
of  the  texts  read,  and  two,  they  are   certainly  affected  by  the  parameters 
of  the  electronic  circuits.   Such  measurements  should  always  be  accompanied 
by  equations  whereby  the  results  can  be  converted  to  more  standard  units 
such  as  sy 1 1 ab les  . 
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Let  me  speak  more  about  the  collection  of  normative  data.   The 
rate  at  which  a  speaker  will  read  a  text  is  obviously  a  function  of  his 
set.   We  do  not  know  to  what  extent  a  speaker  can  consciously  control 
his  oral  reading  rate,  but  in  collecting  norms  we  can  distinguish  at 
least  three  distinct  sets  that  can  be  controlled  by  instructions.   One: 
we  can  ask  a  speaker  to  read  "as  fast  as  he  can";  Two:  we  can  ask  a 
speaker  to  read  "at  a  normal,  comfortable  rate".   (This  was  the  instruction 
used  in  the  data  collected  by  Sampson.)   Three:  we  can  ask  a  speaker  to 
read  "so  as  to  commun icate"--"as  if  he  were  reading  the  material  to  a 
friend,  making  sure  that  he  gets  the  meaning".   Schwartz  (1961)  demonstrated 
that  speakers  instructed  to  "read  so  as  to  communicate"  slowed  down 
considerably  from  the  "normal"  rate;  from  his  data  I  compute  that  the 
rate  was  on  the  average  about  87%  of  the  normal  rate.   (Schwartz,  inciden- 
tally, failed  to  report  rates  in  proper  form.   He  reported  his  measurements 
in  terms  of  average  number  of  minutes  to  read  5  sentences;  thus,  he 
committed  three  errors:  (1)  failure  to  specify  units  of  speed  output 
in  useful  terms;  (2)  reporting  rate  as  time  per  unit  of  speech  output; 
and  (3)  arithmetic  averaging  of  times  per  unit  of  output,) 

There  are    several  instances  in  the  literature  in  which  speakers 
were  reported  to  control  their  rates  at  designated  levels,  but  one  gets 
the  impression  that  this  was  accomplished  only  after  considerable  trial 
and  error.   Harwood's  speaker  (1955)  read  at  "carefully  controlled" 
rates  of  125,  150,  175  and  200  words  per  minute;  Goldstein's  speaker 
(19^0)  read  at  various  rates  from  100  to  a  surprising  322  wprn."   Note, 
of  course,  the  unfortunate  use  of  words  as  the  units  in  these  measurements. 

!  can  report  a  bit  of  normative  data  using  the  "maximum"  speed 
set  and  the  "normal"  speed  set.   My  subjects  were  130  college  and  graduate 
students,  all  native  speakers  of  English,   For  a  text  of  125  words  or  218 
syllables  (thus,  syllable-to-word  ratio  =  1.7^0,  the  rates  under  the 
"maximum"  speed  set  were  normally  distributed  with  the  following  means 
and  standard  deviations: 

Mean      S.  D. 

WPM       205.^     29.6 

SPM       358.2     51.6 

These  data  can  be  compared  with  the  data  for  passage  6  in  Figure  1,  since 

the  syllable/word  ratio  was  similar.   For  a  text  of  177  words  or  297  syllables 

(thus,  syllable-to-word  ratio  =  1,68),  the  rates  under  the  "normal"  speed 

set  were  normally  distributed  with  the  following  means  and  standard  deviations: 

Mean     S.  D. 

WPM        172.2     19-2 

SPM        289.2    32.2 

The  latter  data  are    consistent  with  the  data  reported  in  Figure  1  if  account 
is  taken  of  the  S/W  ratio. 

The  literature  contains  a  number  of  reports  of  oral  reading  rates 
(Parley,  19^0;  G i bbons  et  a_L  ,  1958;  Peters,  195^;  Henze,  1953;  Starkweather 


"Actually,  this  last  rate  (322  wpm)  was  created  artificially  by 
speeding  up  a  phonograph  record. 
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and  Hargreaves,  1 96^+)  ,  but  in  most  cases  the  reports  are   difficult  to 
interpret  because  they  are   onl.y  in  terms  of  words  per  minute  or  because 
insufficient  information  is  given  as  to  the  instructions  to  the  speakers, 
the  nature  of  the  texts  read,  and  other  relevant  variables. 

I  will  conclude  by  saying  that  in  this  brief  span  of  time  I  have 
been  unable  to  cover  the  literature  available  or  to  mention  all  the 
variables  that  appear  to  be  relevant.   I  have,  however,  tried  to  identify 
the  main  issues  that  must  be  reckoned  with  in  measuring  speech  rates, 
whether  they  are  inputs  or  outputs  of  speech  compression  systems. 
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Table  1 
Data  on  Oral  Reading  Rates  for  Six  Passages 
Varying  in  Syllable/Word  Ratio,  by  Three  Methods  of  Measurement, 

from  24  Adult  Subjects. 


1 

2 

3 

4 

Passage 
5 

6 

Words 

302 

319 

305 

319 

318 

313 

Sy 1  lab les 

362 

411 

418 

475 

512 

542 

Phonemes 

980 

1071 

1  108 

1205 

1315 

1380 

Syll./Wd. 

1  .20 

1  .29 

1.37 

1  A3 

1.61 

1.73 

Phon./Syll.  2.71   2.61   2.65   2.54   2.56   2.54 


Oral  Reading  Rates  Means 

Over 
Passages 


Words/Min.      M  221.75  224.47  191.76  I87.I6  174.38  169.85  194.90 

o_  30.71  26.63  24.35  21.92  21.40  21.69  21.22 

M/M  1  .  14  1  .  15  .98  .96  .89  .87  CV.  = 

o/M  .16  .14  .12  .11  .11  .11  .109 

Syll./Mln.      M  265.81  289.22  262.80  278.64  280.77  294.12  278.56 

o_  36.82  34.31  38.38  32.60  34.46  37.56  11.34 

M/M  .95  1.04  .94  1.00  1.01  1.06  C.V.  = 

o/M  .13  .12  .12  .12  .12  .13  .041 

Phonemes/Min.   M  719.60  753.64  696. 60  706.89  721.09  748.61  724,41 

o_  99.67  89.40  88.49  82.73  88.48  95.64  20.28 

M/M  .99  1.04  .96  .98  1.00  1.03  C.V.  = 

o/M  .14  .12  .12  .11  .12  .13  .028 


The  passages  were  slightly  edited  versions  of  selections  as  follows: 


1.  Ernest  Hemingway,  The  Old  Man  and  the  Sea.  (1952) 

2.  Sherwood  Anderson,  Wi  nesburg ,  Oh  io  .  (1919) 

3.  F.  Scott  Fitzgerald,  Tender  is  the  Night.  (1933) 

4.  Thomas  Wolfe,  Look  Homeward,  Angel.  (1929) 

5.  J.  D.  Salinger,  Franny  and  Zooey.  (1955) 

6.  Henry  James,  Daisy  Mi  1 ler,  ( 1 878) 
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CHAPTER  XIV 

The  Use  of  Bandwidth  and  Time  Compression 
for  the  Hearing  Handicapped 


W  .  R  .  Zem  1  i  n;'- 


Under  normal  conditions,  adequate  perception  of  speech  signals,  and 
their  salient  subtleties,  may  be  achieved  provided  the  auditor  has  a  phy- 
siologic receptor  with  a  band-width  of  from  about  90  to  8000  Hz.   Under 
conditions  less  favorable  than  normal,  very  adequate  discrimination  (based 
on  an  operational  criterion)  may  be  obtained  with  a  receptor  band-width 
of  from  250-3000  Hz,  or  even  less    One  example  of  a  less  favorable  condition 
is  hearing  impairment 

For  descriptive  purposes,  and  for  purposes  of  expediency,  hearing 
impairments  are    often  classified  as  high  frequency  losses  or  low  frequency 
losses  (a  functional  classification)  --  or  conductive,  sensor i -neura 1 , 
or  mixed  losses  (a  physiological  classification) 

The  fields  of  experimental  and  clinical  audiology  have  rather  con- 
sistently demonstrated  that  pathology  of  the  conductive  mechanism  for  the 
most  part  results  in  either  a  broad  band  frequency  hearing  loss,  or  a 
predominantly  low  frequency  loss,  while  sensor i-neura 1  pathology  results 
in  predominantly  high  frequency  hearing  losses..   Regardless  of  the  severity 
of  the  pathology  in  the  case  of  pure  conductive  loss,  the  band-width  of 
the  end  organ  is  largely  retained  and  hearing  loss  can  be  compensated  for 
by  amplification  of  the  speech  signal,   In  other  words,  conductive  pathology 
only  affects  the  recept i veness  of  the  auditor,  and  not  his  band-width. 

Hearing  loss  due  to  marked  end  organ  pathology  or  nerve  damage  is 
not  readily  compensated  for  by  either  broad-band  amplification  or  by  shaped 
amplification.   Not  only  is  the  recept i veness  of  the  auditor  decreased, 
but  the  band-width  is  narrowed.   Thus  far,  it  has  been  unrewarding  to  attempt 
to  widen  a  narrow  band  pass  of  the  abnormal  ear    through  amplification. 
Improvements  in  speech  discrimination,  might  however,  be  possible  through 
more  effective  use  of  the  remaining  receptor  mechanism.. 

This  paper  deals  with  some  attempts  to  arrive  at  a  technique  which 
would  permit  exploitation  of  the  restrictive  band-width  of  persons  with 
sensor i-neura 1  loss. 

Research  has  consistently  demonstrated  that  vowel  recognition  seems 
not  so  dependent  upon  the  exact  frequency  locations  of  the  formant  bands, 
as  it  is  upon  their  proportional  relationships    if  it  is  the  relative, 
rather  than  the  absolute,  formant  location  of  the  vowel  sounds,  it  ought 
to  be  possible  to  shift  the  entire  speech  spectrum  downward  on  the  frequency 
scale,  retaining  proport iona 1  i ty  and  yet  bringing  the  absolute  frequencies 
within  the  band  pass  of  the  pathologic  hearing  mechanism. 

The  idea  of  frequency-shift  or  band-width  compression  is  not  new. 
In  1929  Fletcher  reported  the  intelligibility  of  standard  articulation  tests 
when  played  at  various  phonograph  turntable  speeds.   He  reported  a  speed 
of  .60  that  of  normal,  produced  an  articulation  score  of  41%. 

Ochiai  and  his  colleagues  thoroughly  investigated  the  effects  of 
frequency  shift  on  the  articulation  scores  of  Japanese  speech,   The  results 
indicate  that  an  articulation  score  of  70%  could  be  obtained  with  a  slow- 
play  ratio  of  .50.   The  speakers  were  adult  male   The  results  also  indicated 
that  considerably  better  articulation  scores  were  obtained  when  frequency 
shift  was  employed  on  recordings  of  adult  female  speech  and  of  the  speech 
of  ch  i Id  ren . 

In  I956,  Kurtzrock  investigated  time  and  frequency  distortion,  employing 
slow-played  speech  and  frequency  expansion,  u^ing  the  compression-expansion 

-Dr.  W   R.  Zemlin  is  Director  of  the  Speech  and  Hearing  Research 
Lab.,  338  lllini  Hall,  University  of  Illinois,  Champaign,  Illinois. 
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device  developed  by  Fairbanks,  Everitt,  and  Jaeger,   Kurtzrock  concluded, 
among  other  things,  that  frequency  change  is  of  more  importance  than  time 
change  alone  in  affecting  articulation  scores,  and  that  vowels  are  more 
affected  than  consonants  by  frequency  distortion. 

In  1961  Tiffany  and  Bennett  also  investigated  the  intelligibility 
of  slow-played  speech,  which  consisted  of  word  lists  made  up  of  repititions 
of  10  different  vowels  in  an  (H-D)  context.   Male  and  female  readers  were 
recorded.   Results  indicate  that  although  articulation  scores  decreased  with 
slow-played  speech  they  were  significantly  less  affected  by  distortion  of 
the  female  speech.   A  second  experiment  indicated  that  although  articulation 
scores  suffer  from  slow-played  speech,  considerable  improvement  can  be 
obtained  by  training,   On  the  basis  of  this  and  other  studies,  it  would  seem 
that  it  is  not  frequency  change  per  se  which  is  a  critical  factor,  but  the 
degree  to  which  the  formant  locations  remain  within  the  realm  of  the  listeners' 
common  experience  of  human  speech.   When  this  realm  is  transcended,  intelligibility 
suffers    It  is  well  known  that  two  important  parameters  of  speech  signals  are 
the  distribution  of  energy  along  the  frequency  scale,  and  the  temporal  relation- 
ships of  successive  phonemes.   In  most  of  the  previous  experiments  both  of  these 
parameters  have  been  subjected  to  distortion. 

If  recorded  speech  is  slow-played  at  one-half  its  normal  speed,  the 
time  scale  is  doubled,  and  frequencies  are    lowered  by  an  octave.   In  a  series 
of  experiments  at  the  University  of  Illinois,  slow  played  speech  (termed  band- 
width compression)  has  been  combined  with  a  proportional  amount  of  time 
compression,   In  this  manner  the  desired  frequency  shift  may  be  obtained 
without  the  effects  of  time  distortion. 

The  first  experiment  was  essentially  a  replication  of  one  conducted 
by  Tiffany  and  Bennett,  but  using  stimulus  materials  which  had  been  subjected 
to  both  band-width  and  time  compression.   On  the  first  trial,  using  randomized 
lists  of  (H-D)  words,  a  group  of  college  students  obtained  an  articulation 
score  of  50%.   After  three  additional  trials,  the  score  had  increased  to  86%, 
as  compared  with  the  initial  results  of  21%  and  final  results  of  35%,  obtained 
by  Tiffany  and  Bennett,   These  differences  seem  to  have  two  implications:   a) 
even  though  articulation  scores  suffer  from  distortion,  some  learning  can  take 
place  with  repeated  presentations  of  the  materials,  and  b)  restoration  of  the 
time  element  seems  to  result  in  an  increase  in  the  articulation  scores. 

in  a  second  experiment,  the  same  band-width time-compressed  materials 

were  presented  to  a  group  of  hearing  handicapped  children  who  ranged  in  age 
from  7  to  13  years.   They  had  various  medical  histories,  but  for  the  most 
part  suffered  from  sensor i -neura  1  losses.   Some  used  hearing  aids,  some  didn't 
need  them,  and  others  couldn't  be  helped  by  amplification.   The  group  mean, 
when  presented  with  an  undistorted  list,  was  55%.   This  low  articulation  score 
can  be  partly  accounted  for  by  the  fact  that  some  of  the  words  in  the  list  /hoyd/ 
were  not  in  the  children's  repertory,  and  were  very  likely  nonsense  material. 

The  initial  band-width time-compressed  trial  yielded  a  score  of  16%  and  the 

final,  fourth  trial,  yielded  a  score  of  38%--values  similar  to  those  obtained  by 
Tiffany  and  Bennett  with  a  normal  adult  population. 

At  the  present  time,  an  experiment  being  conducted  as  part  of  a 
doctoral  dissertation,  is  underway  at  the  University  of  Illinois.   Recordings 
of  standardized  discrimination  tests  have  been  subjected  to  30,  kO ,  and  50% 
bandwidth  compression,  and  also  to  30,  40 ,  and  50%  bandwidth  compression, 
with  time  restored  by  means  of  the  familiar  speech  compression  techniques. 
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These  recordings,  and  modified  recordings,  have  been  used  as  stimulus 
materials  for  500  grade  school  children,  in  an  attempt  to  arrive  at  answers 
to  the  following  questions: 

a)  With  normal  children  as  auditors,  to  what  extent  does  bandwidth 
compression  affect  discrimination  scores? 

b)  To  what  extent  does  restoration  of  time  affect  discrimination 
scores? 

c)  Is  there  a  combination  of  band-width  and  time  compression  which 
will  facilitate  discrimination  in  a  hearing  handicapped  population? 

Thus  far,  only  an  answer  to  the  first  question  seems  to  have  emerged 
from  the  data.   When  a  CID  W-22  monosyllabic  word  list  was  given  to  a  control 
group  (und i s tor  ted) ,  the  mean  percentage  of  corrective  responses  was  67%. 
With  30%  band-width  compression,  the  scores  dropped  to  50%.   At  k0%   compression, 
the  scores  were  26%  and  at  50%  compression,  they  had  dropped  to  a  low  of  12%. 
The  remaining  questions  await  further  interpretation  of  the  data.   In  summary, 
these  experiments  and  others  seem  to  indicate  recognition  of  speech  material 
is  dependent  upon  frequency  patterns  rather  than  frequency  distribution  per  se; 
that  restoration  of  time  in  band-width  compressed  speech  at  least  partly  overcomes 
the  distortion  effects;  and  finally,  frequency  shifts  beyond  a  certain  amount, 
regardless  of  time  restoration,  seem  to  transcend  a  listener's  realistic  and 
previous  language  experience,  and  the  listener's  common  experience  of  human 
speech.   When  this  realm  is  transcended,  intelligibility  suffers. 
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CHAPTER  XV 

Speech  Compression  By  Tape  Loop  And  By  Computer 
W.  D.  Chapman- 

The  techniques  for  time-compressing  speech  to  be  described  in  this 
paper  were  designed  specifically  for  compressing  short  segments  of  speech. 
As  such  they  are    not  immediately  applicable  to  continuous  compression  of 
large  quantities  of  speech.   However,  some  interesting  facts  have  been 
revealed  as  a  result  of  compressing  short  segments  that  may  be  useful 
for  the  large  scale  problem. 

The  task  at  hand  was  to  compress  individual  words  to  fit  the 
one-half  second  time  slot  of  the  IBM  7770  Audio  Response  Unit.   This 
device  provides  information  output  from  a  computing  system  by  means  of 
speech  messages,  rather  than  the  more  typical  forms  of  graphic  output. 
The  device  stores  up  to  128  prerecorded  words  on  a  magnetic  drum  that 
rotates  once  every  half-second.   Each  word  occupies  one  track  on  the 
drum,  and  a  message  is  composed  under  computer  control  by  selecting  the 
appropriate  sequence  of  word  tracks  to  form  the  desired  message.   The 
first  recording  I  will  play  demonstrates  how  the  unit  sounds.  (RECORDING) 

Since  each  word  must  occupy  no  more  than  one  revolution  of  the 
drum,  it  must  be  recorded  in  less  than  500  milliseconds.   This  time  slot 
is  sufficiently  long  for  monosy 1 1 ab i cs  and  a  number  of  po 1 ysy 1 1 ab i cs 
when  the  words  a^e   uttered  in  isolation  at  a  normal  pace.   However, 
typically  30  percent  of  the  words  for  a  given  vocabulary  are    longer  than 
500  milliseconds,  and  thus  require  time  compression  to  fit  within  the 
t  ime  slot. 

This  paper  will  describe  two  separate  techniques  for  compressing 
words,   The  first  is  pitch  period  compression  on  a  digital  computer,  and 
the  second  we  will  call  "asynchronous  analog  compression".   Chronologically, 
we  employed  pitch  period  compression  first,  and  then  changed  to  the 
asynchronous  analog  method  for  reasons  to  be  described  later;  therefore, 
this  paper  will  deal  first  with  pitch  period  compression 

Assuming  that  the  audience  is  familiar  with  the  fundamental  problem 
of  time  compressing  speech,  it  will  be  treated  here  only  briefly.   Simply 
stated,  to  eliminate  the  "Donald  Duck"  effect  of  linearly  time-compressed 
speech,  the  object  is  to  retain  the  original  integrity  of  the  short-term 
average  spectrum  by  one  of  two  methods.   The  first  method  is  to  remove 
time  segments  of  the  original  waveform  while  retaining  time  segments  of 
sufficient  duration  to  adequately  specify  for  human  perception  the  short- 
term  average  spectrum.   The  two  techniques  described  in  the  present  paper 
are    both  of  this  type.   The  second  method  is  to  time  compress  linearly 
and  then  regain  the  integrity,  or  appropriate  position  of  the  snort-term 
average  spectrum  by  translating  the  resulting  error  spectrum  to  its  correct 
position  in  the  frequency  domain.   The  BTL  harmonic  compressor  and  various 
vocoding  techniques  fall  into  this  category. 

Time  segments  may  be  deleted  in  a  variety  of  ways  with  a  variety 
of  segment  durations.   Periodic  sampling  by  such  techniques  as  the  rotating 
head  approach  is  performed  without  regard  to  the  instantaneous  waveform. 
The  key  problem  here  is  to  join  the  ends  of  the  time  waveform  where  the 
sample  has  been  deleted    Since  severe  discontinuities  may  occur  in  the 
instantaneous  waveform,  it  is  necessary  to  provide  some  method  of  cross- 
fading  the  two  signals  to  reduce  the  resulting  abrupt  transient.   If  the 
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period  of  the  sampling  function  were  synchronous  with  the  period  or  fund- 
amental frequency  of  the  voice,  it  would  be  possible  to  eliminate  most 
of  the  transients  at  the  junctions.   Such  synchronism  is  not  easy  to 
obtain  in  real  time,  however,  since  it  is  necessary  to  have  a  precise 
specification  of  the  fundamental  frequency  and  to  note  the  points  in 
the  amplitude  function  where  waveform  matches  may  be  determined  precisely. 
It  is  profitable,  therefore,  to  convert  the  analog  speech  signal  into 
its  digital  representation  and  store  it,  either  on  a  digital  tape  or  in 
the  core  memory  of  the  compute1'  where  it  may  be  processed  in  a  precise 
ma  n  n  e  r  . 

The  first  step  in  pitch  period  compression  is  to  calculate  the 
location  of  the  pitch  periods.   This  may  be  accomplished  in  several  ways. 
One  technique  is  to  observe  the  rate  of  zero-crossings  of  the  waveform 
and  search  for  points  where  the  rate  increases  or  decreases  repetitively. 
Another  technique  is  to  search  for  points  of  maximum  excursion  rate  or 
peak  amplitude  of  the  time  waveform,  and  then  measure  the  distance  between 
these  points.   There  are  a  variety  of  other  techniques  which  also  suffice. 
The  main  point  of  the  fundamental  period  measurement  is  to  locate  points 
in  the  waveform  where  repeatable  functions  occur.   These  points  are  not 
necessarily  the  so-called  "beginning11  of  the  pitch  period.   When  it  is 
impossible  to  determine  the  precise  period  length,  a  sampling  period  is 
arbitrarily  established  which  is  the  average  of  the  detectable  periods  in 
the  immediate  vicinity.   This  technique  also  establishes  sampling  points 
in  voiceless  sounds,  such  as  the  fricatives., 

The  next  step  in  pitch  period  compression  is  to  determine  the 
required  percentage  reduction  in  the  duration  of  the  word.   A  number  of 
pitch  periods  are    then  removed  from  the  word  uniformly  throughout  its 
duration  to  reduce  it  to  the  desired  length.   For  example,  if  it  is 
desirable  to  reduce  the  length  of  the  word  by  25  percent,  then  every 
fourth  period  is  removed  from  the  digital  record.   The  remaining  ends 
of  the  time  wave  form  are    then  joined  in  the  actual  compression  process. 
Expansion  of  word  length  is  also  possible,  of  course,  by  repeating  pitch 
periods  an  appropriate  number  of  times.   The  digital  record  of  the  word 
is  then  reconverted  to  its  analog  form  and  stored  on  the  track  of  the 
analog  drum  for  use  by  the  audio  response  unit. 

The  speech  produced  by  pitch  period  compression  may  be  as  precise 
and  noise-free  as  the  user  wishes,  depending  on  the  level  of  sophistication 
employed  in  the  pitch  period  detection  process.   The  integrity  of  the 
spectrum  has  remained  essentially  unaltered,  since  a  majority  of  the 
pitch  periods  are    retained  in  their  entirety.   Limitations  on  the  quality 
are  the  same  as  those  encountered  by  other  equally  sophisticated  approaches 
in  either  the  time  or  frequency  domains.   For  example,  it  is  stiM  possible 
to  remove  whole  speech  events,  such  as  the  aspiration  after  a  stop  consonant 
which  occurs  in  a  shorter  interval  than  the  average  pitch  period.   Further, 
where  the  formant  structure  exhibits  pronounced  dynamic  characteristics, 
it  is  possible  to  encounter  discontinuities  in  the  formant  frequencies 
that  are   audible  and  sometimes  mask  the  intelligence  when  high  compression 
is  desired.   Although  pitch  period  compression  is  not  necessarily  limited 
to  individual  words,  another  drawback  to  the  technique  just  described 
is  that  it  is  expensive  to  implement.   Processing  approximately  100  words 
may  cost  as  much  as  500  to  1000  dollars,  depending  on  how  many  times  the 
words  must  be  processed  to  obtain  usable  samples.   For  running  text,  it 
would  cost  perhaps  300  dollars  per  minute  of  original  speech. 

Prohibit've  cost  and  processing  time  led  to  the  investigation 
of  the  second  technique  --  asynchronous  analog  compression.   Since  the 
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task  at  hand  was  to  compress  words  individually,  it  was  possible  to  think 
in  terms  of  storing  the  word  on  a  tape  loop  for  repeated  examination  and 
extraction  of  time  segments.   Slide  One  gives  an  overall  view  of  the 
instrument  that  was  designed-and  built  to  do  this  task.   It  is  called  a 
Speech  Analog  Comression  and  Editing  Loop,  or  SPACEL00P  for  short. 

The  word  or  phrase  to  be  compressed  is  recorded  on  a  tape  loop, 
located  on  the  top  of  the  device.   The  whole  mechanism  is  roughly  similar 
to  a  two-channel  tape  recorder,  with  a  few  notable  exceptions.   Slide 
Two  is  an  enlarged  view  of  the  tape  transport  area,    and  shows  a  series 
of  seven  reproduce  heads,  set  to  operate  on  the  lower  of  two  recording 
tracks.   Six  of  these  heads  may  be  adjusted  laterally  along  the  mounting 
track.   The  seventh  is  usually  fixed  in  the  first  position  as  a  combination 
record  and  reproduce  head.   The  eighth  head  performs  a  timing  function 
and  is  set  to  operate  on  the  upper  of  the  two  tracks.   Tape  motion  is 
from  left  to  right  on  the  front  side  of  the  loop, 

Another  notable  difference  in  this  tape  recorder  is  that  the 
output  from  the  system  may  come  from  each  of  the  seven  reproduce  heads 
by  turning  them  on  and  off  in  sequence.   if  we  reproduce  first  from 
the  head  at  the  right  and  then  during  the  recording  shift  the  output 
from  that  head  to  the  next  in  line,  a  portion  of  the  recording  will  be 
deleted  or  skipped  over  which  is  equivalent  to  the  recorded  length 
between  the  two  heads.   The  duration  of  the  segment  deleted  is  directly 
proportional  to  the  distance  between  the  heads  and  inversely  proportional 
to  the  tape  velocity,   For  a  given  compression  task  the  tape  velocity 
is  held  constant,  so  the  segment  length  is  controlled  only  by  the 
distance  between  the  heads,   The  standard  linear  tape  velocity  is  100 
inches  per  second,  and  the  loop  is  100  inches  in  length,  affording  a 
rotation  time  of  one  second   At  this  velocity,  one  inch  of  tape  is 
equivalent  to  10  milliseconds  of  time  in  the  recording.   For  example, 
if  the  reproduce  output  is  switched  from  the  first  head  to  the  second, 
and  the  head  spacing  is  33  inches,  then  33  milliseconds  will  be  removed 
in  the  switching  process.   In  a  similar  manner,  the  output  may  be 
switched  from  the  second  head  to  the  third,  deleting  still  another  segment 
of  the  recording,  and  so  on  until  the  desired  number  of  segments  have 
been  deleted.   By  adjusting  the  position  of  the  heads  appropriately, 
the  duration  of  the  skipped  segments  may  be  selected 

There  remains,  however,  one  important  additional  aspect  to  the 
segment  deletion  process   The  length  of  time  that  a  particular  head 
is  used  determines  the  duration  of  the  time  segment  reta  i  ned  in  the 
reproduced  output.   These  durations  are    controlled  by  individual 
timing  potentiometers,  shown  at  the  lower  left  on  the  front  panel, 
One  potentiometer  is  associated  with  each  reproduce  head,  except  for 
the  last,  and  is  calibrated  in  milliseconds  with  a  three  digit  accuracy, 

A  timing  pulse  is  recorded  at  one  position  on  the  second  track 
to  reset  the  output  to  the  first  head  before  each  revolution  of  the  tape 
loop.   Initially,  the  position  of  this  pulse  is  automatically  determined 
by  a  word-start  circuit  which  detects  the  beginning  of  each  new  word 
recorded  on  the  audio  track  and  places  a  DC  doublet  just  prior  to  this 
point  on  the  second  track    Subsequently,  this  single  pulse  is  employed 
to  obtain  exact  and  repeatable  time  positioning  of  the  head  switching 
operat  i  ons . 

The  means  for  selecting  appropriate  points  for  segment  deletion 
are  twofold.   First,  the  recording  is  played  back  through  a  standard 
monitor  circuit  so  that  the  audible  effects  of  any  deletion  process 
may  be  heard  as  soon  as  it  is  set.   Second,  it  has  been  found  convenient 
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to  portray  the  output  on  a  storage  oscilloscope  whose  sweep  is  synchronized 
to  the  revolution  of  the  tape  loop.   It  is  possible  to  make  a  rough  deter- 
mination of  the  time  location  of  certain  phonetic  events  by  observing 
the  envelope  of  the  amplitude  waveform. 

Now  let  us  examine  some  typical  cases  of  compression.   If  the 
heads  are  spaced  equally  along  the  deletion  track,  the  duration  of  the 
deleted  segments  will  be  equal  for  as  many  deletions  as  are   employed. 
At  the  tape  velocity  of  100  inches  per  second,  a  total  of  up  to  200 
milliseconds  may  be  deleted,  the  minimum  deletion  interval  is  seven 
milliseconds.   If  the  timing  potentiometers  are  all  set  equal  to  50 
mi  1 1 i  seconds ,  for  example ,  then  the  duration  of  the  reta  i  ned  segments 
will  be  equal,  down  to  the  last  head.   The  total  duration  which  may  be 
retained  is  variable,  of  course,  ranging  from  retaining  the  entire 
word  to  deleting  the  entire  word.   If  equal  spacing  of  the  heads  and 
equal  timing  of  the  retained  segments  were  employed,  then  the  reproduced 
signal  would  be  very  similar  to  that  obtained  with  the  rotat i ng-head 
time  compressors,  with  one  notable  exception.   In  the  SPACEL00P,  the 
switching  operation  is  a  function  of  a  variable  gain  amplifier  or 
gate,  and  not  a  function  of  the  tape  wrap-around  on  the  heads,  as  in 
rotat i ng-head  systems.   Typically,  a  two  millisecond  transfer  time  is 
employed,  during  which  the  output  of  one  head  is  turned  off  in  an 
inverse  log  manner,  while  the  next  head  is  turned  on  with  a  mirror- 
image  switching  function.   The  gating  function  is  performed  on  the 
overall  amplitude  waveform,  and  hence  is  not  subject  to  DC  pulses  or 
frequency  selective  gating  as  in  rotating  head  devices.   The  result  is 
as  smooth  a  switching  characteristic  as  can  be  obtained  with  a  non- 
synchronous  system. 

The  moveable  heads  and  the  timing  potentiometers  may  be  adjusted 
to  within  a  fraction  of  a  millisecond  of  the  desired  switching  points. 
Since  typical  pitch  periods  are    on  the  order  of  seven  to  fifteen 
milliseconds,  it  is  quite  easy  to  adjust  the  switching  points  to  delete 
an  integral  number  of  pitch  periods  to  within  about  five  percent  of  the 
period  length.   This  capability,  coupled  with  the  cross  fading  process, 
makes  it  possible  to  remove  pitch  discontinuities  at  the  switching 
points  and  achieve  a  good  approximation  to  the  pitch  synchronous 
compression  described  earlier. 

The  real  flexibility  of  SPACEL00P  comes  into  play  when  the 
segment  deletion  points  are  adjusted  to  occur  at  non-synchronous, 
predetermined  points  in  the  utterance.   During  extensive  study  into 
the  use  of  non- synchronous  deletion,  it  has  been  found  possible  to 
delete  samples  from  certain  phonetic  environments  without  changing 
the  apparent  pace  of  the  word.   For  example,  it  is  possible  to  remove 
large  portions  from  voiceless  fricatives,  particularly  those  at  the 
beginning  and  end  of  a  word,  without  changing  the  apparent  pace. 
Further,  certain  voiceless  stops  are   quite  long  in  certain  phonetic 
environments,  and  here  it  is  possible  to  remove  up  to  about  50%  of 
the  stop  duration  without  changing  the  apparent  pace.   On  the  other 
hand,  if  we  remove  even  10%  of  the  central  vowel  in  a  CVC  sequence, 
such  as  in  the  word  "six",  the  pace  increase  is  apparent. 

The  great  concern  for  retain;ng  the  pace  of  an  utterance  is 
brought  about  by  the  particular  application  of  an  audio  response  unit. 
As  mentioned  earlier,  approximately  70%  of  the  words  in  a  particular 
vocabulary  do  not  require  compression,  since  they  are  sufficiently  shorter 
than  500  milliseconds  in  their  natural  state.   If  we  are    forced  to  compress 
a  particular  word  and  thereby  produce  an  apparent  increase  in  utterance 
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rate,  the  effect  on  quality  and  intelligibility  is  quite  poor  when  the 

compressed  word  is  placed  between  two  uncompressed  words.   As  demonstrated 

in  the  earlier  recording,  when  a  word  requires  more  than  about  30% 

compression,  it  is  quite  often  placed  in  two  time  slots  for  playback 

in  succession  during  the  production  of  a  message.   However,  this  process 

can  be  quite  wasteful  of  storage  locations  if  a  large  vocabulary  is 

of  interest. 

I  shall  next  play  some  recordings  that  illustrate  the  performance 
of  the  SPACEL00P  in  compressing  certain  words  and  short  phrases.  (RECORDINGS) 

There  are  a  number  of  interesting  performance  characteristics  of 
the  SPACEL00P  design,  but  one  in  particular  may  be  of  interest.   Slide 
3  shows  a  closeup  view  of  the  capstan  drive  mechanism.   It  is  a  difficult 
task  to  position  the  compressed  analog  recording  precisely  in  the  drum 
track  so  that  all  recordings  on  adjacent  tracks  begin  at  the  same  time. 
Since  the  drum  has  a  rather  large  mass,  it  is  impossible  to  position  it 
at  rest  and  then  start  its  motion  at  the  precise  time  the  recording  is 
ready  for  transfer  from  the  SPACEL00P.   Therefore,  the  SPACEL00P  is 
equipped  with  a  fast,  precision  start  feature.   First,  the  tape  loop 
is  positioned  by  hand  to  within  one  millisecond  of  tape  time  by  monitoring 
the  timing  pulse  as  it  is  moved  back  and  forth  over  the  pulse  head.   When 
the  tape  is  properly  positioned,  SPACEL00P  is  set  to  start  automatically 
when  it  receives  a  start  pulse  from  the  rotating  drum.   The  tape  transport 
is  designed  to  have  a  very  small  moving  mass  requiring  acceleration  when 
the  tape  is  brought  up  to  operating  speed.   The  largest  masses  in  the 
system  for  stabilizing  tape  velocity  are    the  capstan  and  capstan  idler 
assemblies.   Both  of  these  units  are  rotating  constantly,  even  while 
the  tape  is  at  rest  when  the  capstan  idler  is  disengaged  from  the  capstan. 
An  auxiliary  capstan  just  to  the  rear  of  the  idler  keeps  the  idler  in 
motion  at  its  normal  running  velocity.   As  a  result  of  these  design 
precautions,  only  45  milliseconds  are    required  to  obtain  final  stabilized 
tape  velocity  after  a  start  pulse  is  received  from  the  rotating  drum. 
Almost  all  of  this  time  is  used  to  move  the  idler  into  contact  with 
the  tape  and  the  capstan.   Even  more  important,  the  start  time  is  repeatable 
to  within  one  millisecond.   This  means  that  a  compressed  recording  may 
be  placed  in  a  time  slot  to  within  two  milliseconds  of  its  desired 
position,  a  notable  synchronizing  feat  in  analog  recording  systems. 
The  drum  of  the  Audio  Response  Unit  is  shown  in  Slide  4,  along  with 
associated  equipment. 

In  conclusion,  two  techniques  for  compressing  or  expanding 
speech  by  the  time  segment  deletion  or  addition  method  have  been  described. 
The  first  was  pitch  period  compression  of  the  digitized  speech  signal 
in  a  computer.   This  technique  operates  quite  well  as  long  as  pitch 
period  measurements  may  be  precisely  made.   It  is  adaptable  to  compression 
of  continuous  speech  signals,  but  it  is  quite  expensive  in  terms  of 
computer  processing  time.   A  modification  of  this  approach  using  a 
real-time  pitch  tracker  in  conjunction  with  a  buffer  memory  that  contains 
only  two  or  three  pitch  periods  would  be  considerably  less  expensive  and 
could  function  in  real  time.   For  operation  on  continuous  speech,  this 
approach  would  be  quite  profitable. 

The  second  technique  described  was  asynchronous  analog  compression 
using  our  SPACEL00P  system.   This  approach  is  the  easiest  to  use  for 
compressing  short  utterances,  is  less  expensive,  and  affords  the  additional 
advantage  of  compressing  speech  without  increasing  its  apparent  pace 
in  many  instances. 
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Assuming  it  is  desirable  to  time-compress  speech  while  minimizing 
the  effects  of  increased  pace,  it  may  be  possible  in  the  future  to  combine 
the  best  characteristics  of  both  techniques  into  one  system.   The  remaining 
problem  would  be  to  automatically  recognize  the  phonetic  classes  where 
it  had  been  proved  that  time  segment  deletion  caused  a  minimum  of  apparent 
pace  increase.   The  solution  to  this  problem  in  speech  recognition  exists 
today  in  methods  such  as  those  described  elsewhere  by  Chapman  and  Dammann, 
although  they  have  not  been  brought  to  bear  on  the  specific  problem  of  time- 
compressing  speech. 

As  a  result  of  further  research,  it  may  be  possible  in  the  future 
to  time-compress  speech  on  a  continuous  basis  by  deleting  primarily  those 
portions  of  the  speech  signal  that  do  not  contribute  to  the  apparent  pace 
of  the  speech.   Considerable  research  will  also  be  required  into  the 
factors  which  contribute  to  the  pace  of  an  utterance  on  both  the  micro- 
and  macroscopic  levels.   The  tools  to  modify  the  speech  signal  in  such 
a  flexible  fashion  will  most  certainly  be  greeted  with  great  enthusiasm 
by  our  psychologist  friends. 


08 

« 


CHAPTER  XVI 


Signal  Analysis  Of  Speech  Time-Compression  Techniques 

S.  Joseph  Campane 1  la* 


It  is  the  intent  of  this  paper  to  discuss  in  some  detail  the 
time  compressed  speech  problem  from  the  point  of  view  of  the  type  of 
distortion  produced  by  the  processing  method  and  its  toll  on  the 
compressed  speech  intelligibility.   Three  techniques  are    selected  for 
this  discussion.   These  are:   (a)   simple  speed-up  by  playback  of  the 
speech  signal  at  a  rate  greater  than  that  at  which  it  was  recorded; 
(b)   reduction  of  the  time  occupied  by  the  speech  message  achieved  by 
discarding  of  segments  of  the  message,  and   (c)   resolution  of  the  speech 
message  into  parameters  which  govern  its  spectrum  composition  followed 
by  generation  of  speeded-up  remade  speech  by  playback  of  the  parameters 
into  a  speech  synthesize!-.   Of  these  techniques,  the  most  economical  to 
implement  is  the  first  and  the  most  costly  the  last.   As  might  be 
anticipated,  the  one  promising  to  yield  the  most  satisfactory  time- 
compressed  speech  in  terms  achieving  the  maximum  degree  of  speed-up 
with  a  maximum  of  intelligibility  of  the  final  product  is  the  last. 
This  latter  method  also  is  the  most  expensive  and  complicated  in 
terms  of  implementation.   Each  of  these  techniques  is  discussed  in 
order  below. 

1.   TiME-COMFRESSION  BY  SiMPLE  SPEED-UP 

Time  compression  achieved  by  simply  speeding  up  the  play-back 
of  a  recorded  speech  message  obviously  increases  the  rate  of  occurrence 
of  the  events  in  the  speech  message  and  hence  does  achieve  the  desired 
objective  of  increasing  that  rate  of  information  delivery.   However, 
this  method  has  an  attending  effect  that  mitigates  the  success  that  can 
be  obtained  and  severely  limits  the  maximum  of  speed-up  that  can  be  achieved. 
This  undesirable  effect  is  the  shift  in  the  frequency  composition  of  the 
signal  spectrum  caused  by  multiplication  of  the  frequencies  which  compose 
the  original  spectrum  by  a  constant  equal  to  the  ratio  of  the  speed  of 
playback  to  the  speed  of  record.   Thus,  for  example,  a  pair  of  frequency 
components  occurring  at  300  and  400  Hz  respectively  in  the  original 
signal  spectrum  are,    when  played  back  at  a  speed-up  of  two  times, 
translated  to  frequencies  of  600  and  800  Hz  respectively.  Thus  it  is 
seen  that  each  frequency  component  is  multiplied  by  the  speed-up  ratio. 
Also  it  should  be  noted  that  the  difference  frequencies  between  components 
is  multiplied  by  the  same  ratio.   This  is  actually  a  doppler  shift  effect 
of  the  same  type  that  occurs  when  sound  originating  on  a  moving  vehicle 
is  perceived  by  one  standing  in  rest  coordinates. 

The  doppler  shift  effect  is  not  tolerable  to  the  human  ear.       It 
causes  a  shift  in  the  fundamental  period  of  the  larynx  excitation  that 
is  perceived  by  the  listener  as  a  higher  frequency  of  fundamental  pitch 
when  auditioning  the  speed-up  playback.   An  effect  that  is  probably 
even  worse  is  the  shift  in  the  frequency  positions  of  the  formant  frequencies 
Formant  frequencies  are   defined  as  regions  of  the  frequency  spectrum  where 
energy  is  concentrated.   Formants  are    read'ly  seen  in  the  three  dimensional 
display  of  spectrum  intensity  versus  time  and  frequency  that  is  produced 
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by  the  widely  used  sonagraph  spectrum  analyzers.   The  steady  state  and 
dynamic  behavior  pattern  of  these  formants  is  responsible  for  phoneme 
perception  and  hence  word  perception  by  the  human  listener.   When  this 
pattern  undergoes  the  doppler  shift  translation  produced  by  speeded-up 
playback,  the  capacity  of  the  human  perception  system  to  accommodate 
is  rapidly  out-distanced.   The  human  perception  system  is  thus  faced 
not  only  with  the  problem  of  increased  information  flow  but  combined 
with  this  the  problem  of  accepting  this  increased  information  flow  in 
terms  of  a  doppler  shifted  spectrum  energy  distribution.   If  this 
doppler  shift  in  the  spectrum  energy  distribution  could  be  eliminated 
while  retaining  the  increased  rate  of  presentation  of  information,  then 
one  could  expect  a  greater  facility  for  comprehending  the  time-compressed 
speech  at  any  given  speed-up  factor.   If  this  is  accomplished,  it  would 
be  as  if  the  talker  simply  talked  faster  with  no  change  in  either  his 
frequency  of  pitch  or  formant  frequency  range. 

The  doppler  shift  spectrum  distortion  produced  by  simple  speed- 
up playback  is  easily  seen  to  be  unatural  and  hence  poses  to  the  human 
listener  a  difficult  perception  problem.   Speech  is  generated  by  modulating 
the  larynx  impulses  by  the  dynamic  action  of  the  human  vocal  cavity 
and  by  the  noise-like  sounds  produced  by  forcing  air  through  a  variety 
of  vocal  cavity  constructions  formed  during  the  act  of  articulation. 
The  sound  spectrum  produced  as  a  consequence  is  thus  determined  by  the  basic 
physical  features  of  the  vocal  cavity  and  the  dynamic  pattern  performed 
in  the  process  of  articulation.   The  basic  physical  features  are  largely 
fixed  by  the  size  and  shape  details  of  the  talker's  vocal  cavity.   The 
spectrum  energy  distribution  of  the  sound  produced  by  the  action  of  this 
cavity  on  larynx  impulses  and  by  production  of  noise  at  various  constriction 
sites  within  it  is  thus  confined  in  its  range,  being  specified  by  the  vocal 
cavity's  transmission  characteristic  for  the  configurations  it  assumes. 
Formant  frequencies  are  simply  normal  mode  resonances  of  the  vocal 
cavity  transmission  response.   When  a  human  speeds  up  his  rate  of  utterance, 
he  is  free  to  change  the  rate  at  which  he  moves  his  articulatory  controls 
but  he  is  not  able  to  change  the  frequency  range  occupied  by  the  sounds 
emitted.   Thus  in  viewing  the  spectrum  pattern  of  fast  speech  as  opposed 
to  slow  speech  produced  by  a  human  talker  it  is  seen  that  the  formant 
frequencies  occupy  the  same  frequency  range  and  the  only  difference  between 
fast  speech  and  slow  speech  is  the  increased  rate  of  occurrence  of  changes 
in  the  spectrum  energy  pattern.   As  humans  we  have  learned  to  perceive 
speech  in  terms  of  the  relatively  fixed  range  of  frequencies  over  which  the 
speech  spectrum  pattern  can  range,  this  range  being  determined  largely  by 
the  physical  characteristics  of  the  human  vocal  cavity.   When  we  violate 
this  "built-in"  limit,  as  is  done  in  the  case  of  speeded-up  speech, 
perception  becomes  difficult  and  perhaps  even  impossible. 

1 1 .   TIME-COMPRESSION  BY  DISCARDING  SEGMENTS 

From  the  preceding  discussion  it  is  seen  that  time-compression  of 
the  speech  message  by  simple  speed-up  is  attended  by  an  undesirable  doppler- 
shift  distortion  that  produces  an  unnatural  spectrum  energy  distribution 
that  mitigates  successful  perception  by  the  human  listener.   One  method 
of  achieving  time-compression  which  avoids  the  problem  of  spectrum  doppler 
shift  is  to  simply  discard  portions  of  the  speech  message  by  removal  of 
segments  so  that  the  result  can  be  played  back  in  less  time  and  with  no 
attending  dopp ler-sh i f t  of  the  frequency  spectrum.   Obviously,  this  method 
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should  preferably  not  remove  complete  words  or  sentences.   This  should 
be  done  beforehand  by  appropriately  editing  the  text  and  any  further 
removal  of  total  words  or  groups  of  words  should  thus  be  assumed  to 
be  injurious  to  message  content.   Rather  it  is  the  objective  to  remove 
segments  of  the  speech  message  in  a  manner  that  does  not  remove  textual 
content  but  simply  makes  rate  of  utterance  faster*   The  fact  that  this 
feat  can  indeed  be  accomplished  is  due  to  the  great  amount  of  redundancy 
present  in  the  human  speech  message  combined  with  the  occurrence  of  a 
reasonably  high  percentage  of  intervals  of  either  silence  of  insignificant 
signal  energy.   Redundancy  is  responsible  for  the  commonly  observed  fact 
that  a  speech  message  can  be  understood  in  a  high  level  of  noise  background, 
in  the  time  domain  of  the  facsimile  signal  this  redundancy  shows  up  in 
the  repetition  of  pitch  periods  of  almost  identical  waveform*   This 
same  redundancy  shows  up  in  the  spectrum  of  the  speech  message  in  the 
observation  of  relatively  long  time  periods  of  virtually  unchanging 
formant  structure.   It  is  probable  that  more  than  50%  of  the  time 
occupied  by  the  speech  message  is  devoted  to  repetition  of  information 
which  has  already  been  presented  and  this  can  be  removed  by  appropriately 
sampling  the  speech  message  so  that  a  portion  is  preserved  and  a  redundant 
portion  is  discarded.   Provided  this  processing  is  performed  so  that 
non- redundant  information  is  not  lost,  it  is  likely  that  compression  of 
the  speech  message  of  2:1  or  more  which  avoids  the  doppler  shift  spectrum 
distortion  can  be  achieved  without  seriously  degrading  intelligibility. 

One  simple  method  of  achieving  such  processing  is  provided  by 
scanning  a  moving  magnetic  tape  record  with  a  rotating  head  assembly. 
Four  scanning  heads  are  mounted  on  a  rotating  capstan  over  which  the 
tape  runs.   The  tape  comes  in  contact  with  the  capstan  surface  over  a 
90°  arc.   Thus  one  head  is  always  scanning  that  portion  of  the  tape 
that  is  wrapped  over  a  90°  arc  of  the  capstan  surface.   The  signals 
generated  by  the  four  heads  are  summed  to  produce  the  output*   For  a 
2:1  time  compressionj  the  tape  is  translated  at  a  speed  equal  to  twice 
that  of  the  original  recording  speed  while  the  scanning  head  is  moved 
at  the  same  speed  as  the  original  recording  speed*   The  net  effect  of 
this  arrangement  is  preserve  50%  of  the  signal  content  on  the  original 
record  and  this  50%  occurs  at  twice  the  rate  of  occurrence  as  the  input 
signal  but  with  no  doppler  shift.   The  sampling  has  a  50%  duty  cycles  i.e., 
for  each  interval  retained  an  equal  interval  is  discarded^   The  rate  at 
which  this  sampling  occurs  is  determined  by  the  diameter  of  the  scanning 
capstan  and  the  velocity  of  the  tape„   For  a  tape  speed  of  S  and  a  capstan 
diameter  Ds  the  sampling  period  duration  is 

T  =  _7f  D 
8  S 

This  period  should  be  sufficiently  long  so  as  to  avoid  loss  of  the  shorter 
phonemes,   A  value  of  10  milliseconds  should  be  sufficiently  high  to  achieve 
this  end.   Using  this  value  along  with  the  assumption  of  a  tape  speed  of 
30  ips.,  the  scanning  capstan  diameter  turns  out  to  be  0,76^  inches*   Performance 
of  such  a  system  can  be  considerably  enhanced  by  the  use  of  a  symmetrically 
distributed  time  weighting  function  during  the  sampling  interval,  rather 
than  use  of  a  rectangular  time  window.   This  will  remove  discontinuities 
in  the  output  signal  that  might  otherwise  be  annoying* 
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The  system  just  described  proposes  to  achieve  the  time  compression 
by  what  amounts  to  arbritrary  discard  of  50%  of  the  message  content  in 
segments  so  small  that  even  the  shortest  anticipated  event  in  normal  speech 
is  not  totally  discarded,   It  could  be  made  better  by  more  selective  discard 
but  this  would  be  accomplished  at  a  significantly  increased  complexity 
and  cost.   It  is  unlikely  that  such  a  system  could  be  made  to  operate 
for  time  compression  in  excess  of  2:1  without  incurring  significant  loss 
in  intelligibility,   Methods  for  achieving  the  same  effect  using  circulating 
digital  delay  lines  which  avoid  all  moving  parts  and  provide  considerable 
design  flexibility  over  the  method  described  in  the  foregoing  are  available, 
but  space  does  not  allow  their  description  here.   Rather,  it  has  been 
the  intent  to  describe  the  general  technique  and  assay  its  consequences. 

111.   TIME  COMPRESSION  USING  PARAMETRIC  REPRESENTATION 

Parametric  representation  refers  to  the  replacement  of  the  time 
facsimile  speech  signal  by  a  set  of  parameters  that  express,  in  an  efficient 
form,  the  speech  information  content  of  the  spectrum  distribution  pattern 
of  speech  in  time  and  frequency.   By  resolving  the  time  facsimile  speech 
signal  into  a  set  of  such  parameters,  it  is  possible  to  use  them  to  remake 
the  speech  message  at  a  speeded-up  rate  simply  by  the  playing  back  of 
the  parameters  at  the  speed-up  ratio  desired  into  a  device  called  the 
synthesizer.   The  synthesizer  literally  remakes  the  speech  signal .   The 
greatest  advantage  offered  by  such  a  system  is  that  it  can  achieve  very 
great  speed-up  without  either  the  undesirable  spectrum  distortion  that 
is  excperienced  in  the  case  with  the  simple  speed-up  method,  or  the 
necessity  for  discarding  any  portion  of  the  message  as  was  the  case  for 
the  sampling  method  for  producing  time  compression. 

Devices  for  accomplishing  this  type  of  time-compression  already 
exist  although  their  original  intent  is  not  time-compression  but  rather 
data-compression.   They  bear  the  general  name  VOCODER-   A  typical  VOCODER 
consists  of  an  analyzer  which  has  as  its  function  the  decomposition  of 
the  continuous  facsimile  speech  message  into  a  set  of  parametric  control 
signals  and  a  synthesizer  which  remakes  the  speech  message  from  the 
parametric  control  signals.   The  principal  feature  of  the  VOCODER  that 
is  important  to  time-compression  of  speech,  lies  in  the  fact  that  speeded- 
up  play-back  of  the  parametric  control  signals  into  the  synthesizer 
results  in  remade  speech  that  is  speeded-up  in  the  occurrence  of  articulatory 
events  but  in  which  no  spectrum  translation  nor  discarding  of  segments 
of  the  message  takes  place.   It  is  almost  as  if  the  talker  simply  increased 
his  rate  of  utterance.   As  such,  the  frequency  range  occupied  by  his 
fundamental  pitch  and  that  occupied  by  the  normal  modes  of  his  vocal 
cavity  transmission  response  remain  unaltered  from  those  occupied  at 
his  normal  rate  of  utterance. 

Vocoders  applicable  to  achieving  time-compression  fall  in  two 
general  classes:   Channel  Vocoders  and  Formant  Vocoders,,   In  the  Channel 
Vocoder, the  parametric  control  signals  consist  of  the  detected  (average 
peak- i ntens i ty)  outputs  of  a  set  of  contiguous  band-pass  filters  which 
cover  the  speech  spectrum  range  from  100  to  5000  Hz,   Thus  this  set  of 
parameters  govern  the  balance  of  spectrum  energy  that  is  transmitted 
from  the  human  vocal  cavity.   The  number  of  channels  employed  has  ranged 
from  as  few  as  10  to  more  than  20,   Presently,  machines  using  16  channels 
have  attained  a  very  high  degree  of  perfection.   These  outputs  are  also 
accompanied  by  a  pitch  frequency  control  signal  which  provides  an  analog 
measure  of  the  frequency  of  pitch  of  the  larynx,  and  a  buzz-hiss  control 


signal  which  instructs  the  synthesizer  to  employ  buzz  excitation  for  larynx 
excited  segments  and  noise  excitation  for  friction  excited  segments.   In  the 
synthesizer  of  the  Channel  Vocoder  speech  is  remade  by  controlling  the  spectrum 
energy  balance  of  an  artificial'  periodic  larynx  excitation  or  a  noise  excitation 
in  accordance  with  the  pattern  specified  by  the  detected  outputs  of  the  analyzing 
filters.   The  frequency  of  the  artificial  larynx  excitation  is  controlled  by  the 
pitch  frequency  parameter  and  the  selection  of  buzz  or  hiss  by  the  buzz-hiss 
decision  control  parameter. 

In  the  case  of  the  Formant  Vocoder,  the  end  result  is  much  the  same, 
but  the  means  of  accomplishing  it  differs  considerably.   In  the  Formant  Vocoder 
Analyzer  rather  than  resolving  the  speech  signal  spectrum  distribution  into 
spectrum  amplitude  parameters,  it  is  resolved  into  formant  trace  parameters. 
These  specify  the  center  frequency  position  of  a  formant  as  a  function  of  time. 
As  previously  stated  the  formant  frequencies  are    the  normal  modes  of  the  vocal 
cavity  transmission  response.   It  is  necessary  that  three  formants  be  employed 
if  all  of  the  sounds  of  speech  required  for  word  communication  are  to  be 
accommodated.   To  control  the  excitation  intensity,  two  intensity  controls 
are  employed,  one  for  specifying  the  intensity  of  the  periodic  artificial  larynx 
pulse  excitation  and  the  other  for  the  hiss  (noise-like)  excitation.   Also, 
as  in  the  case  of  the  Channel  Vocoder,  a  contro.l  tor  the  fundamental  period 
of  the  larynx  excitation,  i.e.,  the  fundamental  pitch  frequency  is  also  extracted. 
In  the  Formant  Vocoder  Synthesizer  the  vocal  cavity  spectrum  balance  is  re- 
constituted by  means  of  an  electrical  analog  network  of  the  human  vocal  cavity 
containing  circuits  which  regenerate  the  normal  modes  of  the  original  talkers 
vocal  cavity  under  control  of  the  three  formant  trace  parameters.   Excitation  to 
this  electrical  analog  is  provided  from  either  a  source  of  artifically  generated 
larynx  impulses  at  the  fundamental  pitch  period  rate  or  from  a  noise  source. 
Balance  of  these  excitations  is  controlled  by  the  voice  and  noise  excitation 
amplitude  controls. 

Both  of  these  vocoder  types  produce  synthesized  speech  under  control  of 
parameters  and  simply  by  speeding  up  the  rate  of  occurrence  of  these  parametric 
controls  the  rate  of  speech  utterance  is  speeded-up.   In  my  laboratory  at  Melpar, 
Inc.,  one  of  our  principal  areas  of  interest  is  Formant  Vocoder  research  and 
development.   Thus  it  is  that  we  have  in  existance  equipment  capable  of  regenerating 
remade  speech  at  virtually  any  rate  desired.   We  have  in  fact  constructed  a  special 
purpose  graphic  playback  device  which  permits  playback  of  twelve  parallel  graphic 
record  replicas  of  control  parameters  drawn  on  a  chart  in  electrically  conducting 
ink.   These  are    used  to  control  a  Formant  Vocoder  Synthesizer.   This  system, 
called  EVA  (Electronic  Vocal  Analog),  can  easily  be  used  to  demonstrate  speeded- 
up  speech  made  from  speeded-up  playback  of  the  parametric  controls  of  the  formant 
vocoder  type.   A  photograph  of  the  EVA  is  shown  in  Figure  1.   Our  experience 
indicates  that  speech  remade  at  speed-up  factors  easily  in  excess  of  5:1  can  be 
readily  perceived  by  the  listener.   It  is  expected  that  equally  good  results 
can  be  achieved  by  speeding-up  the  control  parameters  of  the  Channel  Vocoder 
into  a  Channel  Vocoder  Synthesizer.   We  have  not  made  quantitative  tests  of 
intelligibility  as  a  function  of  speed-up  but  it  is  obvious  from  even  a  casual 
listening  that  the  remade  speech  generated  is  considerably  better  than  that 
produced  by  the  expedient  of  simply  speeding-up  a  direct  recording. 

IV.   CONCLUSIONS 


Several  means  of  time-compressing  speech  have  been  discussed.   It  is 
pointed  out  that  the  best  time-compression  results  will  probably  be  obtained  by 
speeded-up  playback  of  the  control  parameters  used  in  either  the  Channel  Vocoder 
or  Formant  Vocoder  into  their  respective  synthesizers.   Time  compressed  speech  obtained 


13 


by  this  method  avoids  the  doppler  frequency  distortion  that  one  experiences 
by  simply  speeding-up  the  playback  ot  a  facsimile  signal  record,  and  it  does 
not  incur  the  loss  of  data  that  is  experienced  in  systems  that  offer  to  produce 
time  compressed  speech  without  doppler  frequency  distortion  by  discarding  a  portion 
of  the  record  just  sufficient  to  permit  playback  without  frequency  shift.   It 
is  regrettable  that  the  vocoder  type  systems  that  provide  the  greatest  promise 
are    also  the  most  expensive  to  implement.   Perhaps  this  greater  cost  can  be 
offset  somewhat  by  digital  computer  implementation  of  vocoders  using  high-speed 
spectrum  analysis  programs  currently  being  devised  by  numerous  organizations. 
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Figure  1.   EVA,  A  Graphic  Controlled  Formant  Synthesizer 


CHAPTER  XVI  I 

Effects  and  Interaction  Effects  of  Speaking  Role 
Visual  Limitation  And  Intelligence  Level  On 
Aural  Acquisition  and  Retention  of 
Sentences . 


Wietse  de  Hoop* 

Mr.  Chairman,  Ladies  and  Gentlemen, 

It  is  rather  habitual  to  begin  a  presentation  of  this  nature  with  a 
brief  review  of  previous  work  related  to  the  topic.   Because  of  the  specific 
interest  of  this  select  audience,  I  feel  that  a  review  of  the  literature  on 
compressed  speech  and  its  relationship  to  special  education  is  out  of  place, 
and  I  wi 1 1  therefore  mention  those  studies  only  which  influenced  the  particular 
topic  of  my  study. 

My  interest  in  rapid  speech  was  aroused  by  two  occurances:   the  study 
of  Foulke  and  associates  (1962)  on  blind  _S_s ,  and  the  study  by  Spicker  (1963). 
In  an  earlier  study  (de  Hoop,  1965)  the  writer's  interest  focused  on  listening 
comprehension  of  crippled  and  cerebral  palsied  _S_s .   This  study  left  the  writer 
wondering  about  the  differences,  if  any,  between  comprehension  and  learning, 
when  rapid  speech  is  used.   Several  investigators  have  studied  verbal  learning 
of  mentally  retarded.   For  recent  summaries  and  discussions  of  such  studies, 
the  listener  may  be  referred  to  handbooks  by  Ellis  (1963)  and  by  Stevens  and 
Heber  (1964).   The  greater  part  of  these  investigators  utilized  pa i red-assoc iate- 
or  serial  learning  paradigmes.   An  aspect  which  both  the  comprehension  studies 
and  the  verbal  learning  studies  have  in  common,  is  that  both  try  to  deal  with 
the  influence  of  meaning  on  learning.   For  those  interested  in  rapid  speech, 
this  is  a  highly  significant  aspect. 

Generally  speaking,  previous  research  may  be  summarized  as  follows: 

a.  Speaking  rates  from  about  175  to  200  wpm  yield  the  best  comprehension 
of  the  presented  text. 

b.  Increase  in  speaking  rate  beyond  200  wpm  is  followed  by  decrease  in 
comprehens  ion . 

c.  Mentally  retarded  and  normal  j>s  differ  in  serial  learning. 

d.  Compressed  and/or  expanded  speech  seems  to  hold  promises  for 
application  in  special  education. 

PURPOSE 

The  purpose  of  the  present  study  was  to  obtain  information  regarding 
acquisition  and  retention  of  verbal  stimulus  materials  by  visually  limited 
and  mentally  retarded  individuals  when  these  verbal  stimulus  materials  are 
presented  aurally  at  four  different  speaking  rates,  namely  a  basal  speaking 
rate  of  approximately  175  wpm,  and  three  compressed  rates,  respectively  20%, 
k0%,    and  60%  compressed  or,  in  rates,  approximately  210,  2^5,  and  280  wpm. 


-''Dr.  Wietse  de  Hoop  is  with  the  West  Georgia  College,  Carolton,  Georgia 
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PROCEDURES 

St  imu lus  mater  ia 1 s ,   Four  sentences  were  selected  as  stimulus  materials. 
These  sentences  were  obtained  from  one  and  the  same  story,  namely  the  Ring-Tailed 
Buzz  Saw  by  Donald  Knight  (Simpson,  1962).   An  effort  was  made  to  select  sentences 
of  equivalent  length,  dealing  with  the  same  topic,  yet  differing  enough  to  prevent 
a  combination  in  a  logical  sequence  by  the  S_s  .  The  phonetic  syllable  (Potter,  1962) 
was  used  as  a  response  measure.   In  order  to  ensure  that  each  sentence  had  an 
equal  number,  namely  23  phonetic  syllables,  the  original  text  of  some  sentences 
was  modified  slightly. 

The  length  of  the  sentences  was  based  on  two  considerations.   First, 
several  investigators  have  presented  evidence  about  the  mean  sentence  length 
used  by  mentally  retarded  and  normal  children.   Siegel  (1962)  reported  that 
retardates  with  a  mean  age  of  14.6  years  reacted  to  TAT  pictures  with  sentences 
averaging  6.4  words.   Tempi  in  (1964)  found  a  mean  sentence  length  of  7.6  words 
for  normal  8  year  old  Ss .   Black  (1961)  used  sentences  of  3,  5,  7,  9,  11,  13, 
15,  and  17  words  when  studying  the  length  of  a  statement  on  auditory  reception 
of  airmen,  and  found  the  longest  sentence  least  understood.   Postman,  Turnage 
and  Silverstein  (1964)  studied  the  terminal  number  of  items  reproduced  in  the 
correct  serial  order  of  a  message,  the  so-called  "running  memory  span".   They 
found  that  a  clear  primacy  effect,  in  which  initial  words  are  recalled  better, 
as  well  as  a  clear  recency  effect,  in  which  end  words  are    recalled  better,  could 
be  demonstrated  for  a  serial  list  of  10  words.   These  effects  were  much  less 
evident  for  a  list  of  15  worcfs,  and  the  primacy  effect  disappeared  completely 
with  a  20-word  list.   Since  Postman's  S_s  were  college  students,  it  was  hoped 
that  sentences  of  23  phonetic  syllables  might  prevent  a  primacy  effect  in  the 
younger  Ss  of  this  study. 

A  second  consideration  was  the  desire  to  have  sentences  of  sufficient 
length  to  prevent  easy  acquisition,  but  not  long  enough  to  make  perfect  acquisition 
imposs  i  b le . 

The  sentences  were  then  taped  through  the  good  services  of  the  Athens, 
Ga . ,  chapter  of  "Recordings  for  the  Blind".   Reader  was  Dr,  Joan  Berryman,  then 
a  graduate  student  in  speech  and  hearing.   Mrs.  Berryman  passed  the  test  which 
"Recordings  for  the  Blind"  requires  of  its  readers.   Through  the  good  services 
of  Dr.  Carson  Nolan  of  the  American  Pi  inting  House  for  the  Blind,  the  sentences 
were  then  compressed  to  the  desired  percentages  of  20%,  40%,  and  60%.   Just  for 
the  record,  it  may  be  noted  that  initially  we  had  planned  to  include  a  fifth 
sentence,  representing  20%  speech  expansion.   However,  this  rendition  appeared 
to  be  rather  garbled,  so  that  this  part  of  the  study  was  eliminated. 

Somehow  the  problem  of  possible  differences  between  each  of  the  five 
sentences  had  to  be  tackled.   Complete  randomization  of  each  sentence  and  each 
speaking  rate  for  each  _S  was  impractical.   It  was  therefore  decided  to  do  a 
pilot  study  on  a  small  random  sample  of  10  normal  Ss ,  using  the  rate  of  175 
wpm.   The  results  were  encouraging:   the  differences  between  the  sentences 
were  not  significant.   It  was  therefore  decided  to  first  combine  a  particular 
sentence  with  a  particular  speaking  rate  by  means  of  a  table  of  random  numbers, 
and  then  present  these  sentences  in  the  order  of  the  progressing  speaking 
rates,  namely,  175  wpm,  210  wpm,  245  wpm,  and  280  wpm.   The  order  of  speaking 
rates  and  sentences  were  as  follows: 
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175  wpm :   The  log  which  floated  in  midstream  became  a  temporary  resting 

place  for  the  fleeing  coon. 
210  wpm:   The  dogs  picked  up  a  fresh  trail  of  the  animal  near  a  big  tree 

where  the  coon  had  been  sleeping, 
245  wpm:   The  ringtail  would  probably  be  hiding  in  a  snug  nest  of  leaves 

on  a  big  branch  in  the  tree. 
280  wpm:   The  spray  was  flashing  about  in  the  sunshine  like  broken  glass 

when  the  coon  attacked  the  old  dog. 

One  aspect  may  be  mentioned  before  leaving  the  description   of 
the  stimuli  and  the  responses.   Verbal  materials  in  learning  studies 
are  rather  often  presented  as  visual  stimuli,  while  the  response  modes 
ask  for  either  an  oral  or  a  motor  response.   In  listening  comprehension 
studies,  the  stimulus  materials  are    presented  aurally,  but  the  responses 
are  usually  elicited  by  way  of  a  multiple  choice  pencil  and  paper  test. 
Without  criticizing  such  studies,  it  appears  nevertheless  to  the  writer 
that  it  is  desirable  to  be  consistent  in  the  use  of  sense  modalities 
for  stimulation  and  for  response.   It  was  therefore  decided  to  present 
the  stimuli  aurally  and  to  require  an  oral  response. 

Sub  jects .   Subjects  were  38  visually  limited  and  38  normally 
sighted  pupils,  each  divided  into  two  subsamples,  namely  ]k   mentally 
retarded  and  2k   subjects  of  normal  intelligence.   28  visually  limited 
S^s  were  selected  from  the  Georgia  Academy  for  the  Blind  at  Macon,  Ga ,  , 
and  10  from  special  classes  for  visually  limited  pupils  in  the  City 
of  Atlanta,   Miss  Aurelia  Davis,  Coordinator  of  Special  Education 
for  the  Atlanta  City  Schools  and  her  staff,  and  Mr   Lee  Jcnes ,  Super- 
intendent of  the  Georgia  Academy  for  the  Blind  and  his  staff,  have  not 
just  been  most  helpful;  in  fact,  they  went  out  of  their  way  to  make 
the  study  possible.   The  normally  sighted  Ss  were  selected  from  one 
of  Clarke  County  (Georgia)  Public  Schools.   Miss  Dorothy  Firor,  principal 
was  most  cooperative.   Intelligence  level  of  all  j>s  was  based  on  the 
verbal  scale  of  the  WISC,  which  was  administered  by  the  author  and  a 
friend,  Dr.  Murray  Tilman,   In  order  to  be  included  in  the  sample, 
the  following  criteria  had  to  be  met.   An  IQ  of  60  or  higher,  CA  between 
7  years  and  6  months  and  14  years  and  6  months;  no  gross  physical 
impairments  or  uncorrected  minor  physical  impairments  which  might 
interfere  with  the  task  performance;  absence  of  a  hearing  loss  of 
more  than  15db;  no  major  emotional  disturbance, 

The  two  _S_s  samples  did  not  differ  significantly  in  CA  and  in 
IQ.,  as  Tab  le  1  shows  , 
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TABLE  1 
MEANS  AND  DISPERSIONS  OF  CA  '  S  AND  I Q.  ■  S 


X 

Vi  sua  1  ly 

1 im  i  ted 

Normal ly 

s  ighted 

MR 

Nl 

MR 

Nl 

N  =  14 

N  =  24 

N  =  14 

N  =  24 

139.8 

135.3 

138.7 

127.  1 

CA 

s 

24.30 

13.38 

9.00 

15.50 

Range 

92-171 

100-156 

1 17-149 

102-169 

X 

74.1 

101  .5 

71.4 

96.5 

s 

6.05 

15.65 

5.81 

5.43 

63-81 

89-135 

61-80 

89-105 

Presentation  of  sentences.   The  sentences  were  presented  to  each 
S^  individually  by  means  of  a  Silvertone  Magnetic  Sound  Recorder  Model 
5283,  in  the  order  earlier  indicated.   The  first  trail  consisted  of  a 
block  of  three  presentations  of  a  sentence,  after  which  the  _S  would  be 
asked  to  repeat  as  many  syllables  as  he  or  she  remembered.   The  right 
number  of  syllables  was  considered  as  a  measure  of  initial  acquisition. 
This  was  immediately  followed  by  a  second  trail,  again  consisting  of  a 
block  of  three  presentations  of  the  same  sentence,  after  which  the  _S 
was   again  requested  to  repeat  the  syllables  he  remembered. 

Two  days  later  the  S^  was  asked  first  to  recall  the  syllables 
he  remembered.   The  _S  would  be  given  a  cue  to  each  sentence,  consisting 
of  the  first  two  words  of  each  sentence.   Immediately   following  this 
recall,  a  block  of  three  repetitions  of  each  sentence  was  presented 
again,  followed  by  the  eliciting  of  a  second  response  measure.   These 
response  measures  are  indicated  as  two-day  retention.   Finally,  two 
weeks  later,  the  S^  was  asked  to  recall  as  many  syllables  as  possible 
after  presentation  of  the  appropriate  cue.   This  was  considered  as  a 
measure  of  long-term  retention. 

Ana  1  ys  i  s  of  data .   The  data  were  analysed  by  a  three-factor 
analysis  of  variance  with  repeated  measurements  on  one  factor.   Because 
of  unequal  sub-class  frequencies,  a  least  squares  solution  was  employed, 
which  was  derived  from  two  models  discussed  by  Winer  (1962,  p. 338,  pp. 
375-6).   Thus,  five  separate  comparisons  were  made,  namely  2  for  ac- 
quisition, 2  for  two-day  retention,  and  one  for  two-week  retention. 
The  analyses  are  listed  separately  in  the  appendix. 
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Hartley's  F-Max  test  was  applied  to  each  of  the  five  tables.   In 
each  case,  the  F-Max  ratios  exceeded  the  critical  value  at  the  .05  probability 
level,  so  that  homogeneity  of  variance  could  not  be  assumed.   It  was  felt, 
however,  that  analysis  of  variance  would  not  be  invalid.   Paired  comparisons 
were  made  by  _t-tests,  taking  into  account  Lindquist's  recommendation  (1953) 
that,  because  of  heterogeneity  of  variance,  the  _t-tests  of  the  difference 
between  the  means  of  any  two  treatment  groups  should  be  based  on  the  data 
for  these  two  treatments  only,  rather  than  on  the  mean  square  for  within- 
treatments  computed  from  all  treatment  groups. 

RESULTS 

The  statistical  data  seemed  to  justify  the  following  conclusions: 

Effects  of  Speaki  nq  Rate . 

1.  Acquisition  and  retention  of  the  sentence  presented  at  280  wpm 
was  significantly  less  than  at  175,  210,  and  2^5  wpm.  In  fact,  hardly  any 
acquisition  took  place  at  the  speaking  rate  of  280  wpm, 

2.  Two-day  retention  of  the  sentence  at  210  wpm  was  significantly 
better  than  the  one  at  2^5  wpm,  but  did  not  differ  significantly  from  the 
one  at  175  wpm. 

3.  Acquisition  and  two-day  retention  of  the  sentences  presented  at 
175  wpm  and  at  2^5  wpm  did  not  differ  significantly. 

4.  Two-week  retention  of  the  sentences  presented  at  175  wpm,  210 
wpm,  and  245  wpm  did  not  differ  significantly. 

DISCUSSION 

The  first  result,  indicating  that  very  little  acquisition  took  place 
at  a  rate  of  280  wpm,  is  not  in  agreement  with  the  findings  of  some  other 
investigators,  like  Goldstein  ( 1 940 ) ,  Fairbanks  (1957),  and  Fou 1 ke  (1962). 
These  gentlemen  found  small,  non-significant  decrements  in  comprehension  and 
acquisition.   Pollack  and  Pickett  (1964)  found  that  accurate  speech  perception 
is  highly  dependent  on  contextual  factors.   Larger  language  samples  produce 
many  more  contextual  factors,  of  course.   Second,  part  of  the  difference  may 
be  attributed  to  the  S_s  ,   Generally  speaking,  the  Ss  in  the  reported  study  were 
younger  than  those  of  other  investigators.   This  points  up  the  desirability  of 
studying  variation  of  speaking  rate  at  various  levels  of  development.   Third,  the 
response  measures  may  have  contributed:   previous  investigators  used  some  form 
of  recognition,  while  the  present  study  utilized  a  reproduction  method.   The 
retention  curve  for  reproduction  shows  a  downward  displacement  when  compared  with 
the  recognition  curve. 

The  second  finding,  namely  that  the  210  wpm  rate  yielded  the  best  acquisition, 
is  in  agreement  with  the  results  reported  by  Nelson  (1948),  Enc  and  Stolurow  ( 1 960) 
and  Fou 1 ke  (1962),  but  not  with  the  results  of  Fairbanks  (1957)  and  Spicker  (1963), 
who  found  that  rates  from  140  to  175  wpm  yielded  the  best  results. 

Effects  of  Vi  sua  1  L  im  i  ta t  ion . 

1.  Visually  limited  and  normally  sighted  S_s  did  not  differ  significantly 
in  acquisition  and  two-day  retention  of  the  sentence  at  175  wpm. 

2.  Visually  limited  S_s  scored  significantly  higher  than  normally  sighted 
S_s  in  both  acquisition  and  two-day  retention  of  the  sentences  presented  at  210 
and  245  wpm. 

3.  In  two-week  retention,  the  visually  limited  S_s  differed  significantly 
from  the  normally  sighted  Ss  at  175  wpm,  but  not  at  210  and  245  wpm. 
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DISCUSSION 

These  results  are  generally  what  one  would  expect,  with  one  excepti 
Visually  limited  Ss  showed  a  significant  gain  in  acqusition  scores  between 
the  175  and  210  wpm  rates.   This  finding  is  difficult  to  explain.   One  might 
wish  to  ascribe  it  to  a  learn i ng- to- learn  effect,  but  then,  why  would  this 
be  true  for  the  visually  limited  Ss  and  not  for  the  normally  sighted?   One 
might  also  consider  motivation  as  a  differentiating  factor  between  the  two 
_S  samples.   Or  one  might  wish  to  ascribe  the  difference  to  chance.   At 
any  rate,  further  investigation  seems  appropriate. 

Effects  of  I  nte 1  1  i  qence  Leve 1 . 

1.  As  one  would  expect,  normal  S^s  scored  significantly  higher  than 
the  mentally  retarded  in  all  measures  except  those  pertaining  to  the  280 
wpm  rate. 

2.  The  mentally  retarded  Ss  scored  significantly  higher  on  the 
210  wpm  rate  than  at  175  and  245  wpm  rate,  except  for  two-week  retention. 

3.  The  normal  Ss  followed  the  same  trend  on  the  210  wpm  rate  in 
acquisition  on  first  trial,  but  not  in  acquisition  on  second  trial,  two- 
day  retention  and  two-week  retention. 

k.      As  was  expected,  no  interaction  was  manifest  between  intelligence 
level  and  visual  limitation. 

DISCUSSION 

The  results  enumerated  under  effects  of  intelligence  level  raise 
one  puzzling  question:   why  did  retardates  score  higher  on  the  210  wpm 
rate  in  this  study?   Spicker  (1963)  found  that  the  175  wpm  rate  yielded 
significantly  higher  scores  than  the  225  wpm  rate  with  both  normal  and 
retarded  Ss.   The  writer  has  no  explanation  other  than  some  possibilities 
mentioned  before.   Again,  future  research  may  provide  more  complete 
i  nf ormat  ion . 

In  conclusion,  the  writer  would  like  to  suggest  that  research  on 
the  use  of  varying  speaking  rates  be  continued  with  normal  as  well  as 
exceptional  Ss ,  preferably  along  a  developmental  continuum. 


TABLE  2 
ACQUISITION  OF  FIRST  TRIAL 
DESCRIPTIVE  STATISTICS 


Level  of 

Level  of 

1 nte  1 1 i  gence 

N. 

Rate  of 

Presentat  ion 

Vi  sua  1  Acu  i  ty 

175 

210 

245 

280 

Menta  1  ly 
Retarded 

14 

X 
s 

4.79 
3*49 

12  79 
2.94 

7-07 
3.91 

1  .00 
5-24 

Norma  1 

1 nte 1 1 igence 

24 

J 
s 

13  46 
5.79 

16.83 
4.07 

12.58 
4.88 

1  1  .26 
7-05 

Norma  1 ly 
S  i  ghted 

Menta  1  1  y 
Retarded 

14 

X 

s 

9-57 
4.70 

9.  14 
344 

3.93 
2.43 

3-57 
1  .83 

Norma  1 

1 nte 1 1 i  gence 

2k 

X 

s 

10.46 
4.51 

12.54 
5.32 

9.88 
6.02 

2.63 
1.99 

SUMMARY  OF  ANALYSIS  OF  VARIANCE 


Source 


df 


Mean  Square 


VL-NS 

MR-NI 

VL-NS  x  MR-NI 

Error  between 


1 

1 

I 

72 


160.67 

4.65 

909.42 

26.34 

1 13.03 

3.27 

34.53 

Rates 

VL-NS  x  Rates 

MR-NI  x  Rates 

VL-NS  x  MR-NI  x  Rates 

Error  within 


3 
3 
3 
3 
216 


620.67 

143.67 

112.37 

9.96 

106.80 

9.47 

59.14 

5.24 

11  .28 
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TABLE  3 

ACQUISITION  ON  SECOND  TRIAL 

DESCRIPTIVE  STATISTICS 


Level  of 

Level  of 

1 nte 1 1  i  gence 

N 

Rate 

of  Presentation 

Visua 1  Acu  i  ty 

175 

210 

245 

280 

Vi  sua  1 ly 

Mental  1 y 
Retarded 

14 

X 
s 

8.57 
4.16 

15.64 
3-05 

11  .50 
4.55 

2.71 
1  .38 

L  im  i  ted 

Norma  1 

1 nte 1 1 i  gence 

24 

X 

s 

17-71 
5.97 

19.83 
2.71 

18.13 
2.38 

3.04 
1.88 

Norma  1 ly 

^ienta  1  1  y 
Retarded 

14 

X 

s 

12.00 
4.77 

12.00 
4.49 

6.50 
2.  14 

4.93 
2.16 

S  ighted 

^orma 1 

1 n te 1 1 i  gence 

24 

X 
s 

14.44 
5.46 

15.50 
5.59 

14.21 
6.32 

4.83 
3-25 

SUMMARY  OF  ANALYSIS  OF  VARIANCE 


Source 


df 


Mean  Square 


VL-NS 

MR-NI 

VL-NS  x  MR-NI 

Error  between 


248.77 

266.54 

49.73 


6.23- 
31.73" 

1.25 


Rates 

3 

VL-NS  x  Rates 

3 

MR-NI  x  Rates 

3 

VL-NS  x  MR-NI  x  Rates  ' 
Error  with 

3 
216 

2276.03 
168.38 

165.93 

51.85 

9.76 


233.20-- 

17.25' 

1  7  •  00' 

5. 31' 


TABLE  4 
TWO-DAY  RETENTION  ON  FIRST  TRIAL 
DESCRIPTIVE  STATISTICS 
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Level  of        Level  of         N 
Visual  Acuity    Intelligence 


Rate  of  Presentation 


75 


210 


245 


280 


Menta 1  ly 

X 

2.50 

6.64 

3.79 

2.71 

■ 

Retarded 

14 

s 

3-91 

3.68 

3.54 

1.77 

Visua  1  ly 

L  imi  ted 

Morma 1 

X 

7-17 

7.38 

6.38 

3.33 

1 nte 1 1 i  gence 

2k 

s 

3.63 

6.13 

4.77 

1 .24 

Menta  1  ly 

X 

4.64 

3.00 

1.54 

2.14 

Retarded 

\k 

s 

4.24 

3.03 

0.93 

0.53 

Norma  1 ly 

S  ighted 

^Jorma  1 

X 

6.50 

6.88 

5.46 

2.67 

1  nte 1 1 i  gence 

2k 

s 

6.34 

4.84 

6.28 

1  .24 

SUMMARY  OF  ANALYSIS  OF  VARIANCE 


Source 


df 


Mean  Square 


VL-NS 

MR-NI 

VL-NS  x  MR-NI 

Error  between 


72 


21.59 

266.77 

3.  10 

36.06 


0.59 

7.40< 

0.08 


Rates  3 

VL-NS  x  Rates  3 

MR-NI  x  Rates  3 

VL-NS  x  MR-NI  x  Rates      3 

Error  within  216 


204.79 

7-30 

27.72 

12.  16 

12.09 


6 .  94- 
0.60 
2.29 
1  .00 


24 


Level  of 
Visual  Acuity 


TABLE  5 
TWO-DAY  RETENTION  ON  SECOND  TRIAL 
DESCRIPTIVE  STATISTICS 


Level  of        N 
I nte 1 1 i  gence 


Rate  of  Presentation 


75 


210 


245     280 


Visual  1  y 

Menta 1 1 y 
Retarded 

14 

X 

s 

10.70 
3.91 

16.93 
3.68 

14.29 
3.54 

4.21 
1.77 

Limi  ted 

Norma  1 

1 nte 1  1  igence 

24 

X 

s 

20.38 
3.63 

19-83 
3.06 

19.46 
3.04 

5.29 
2.05 

Norma  1 ly 

Menta  1  ly 
Retarded 

14 

X 

s 

12.76 
5.13 

14.07 
4.41 

7.43 
2.77 

6.07 
3.91 

S  ighted 

formal 

Intel  1 igence 

24 

X 
s 

16.25 
5.26 

16.21 
4.96 

15.33 
6.49 

6.50 
3,65 

SUMMARY   OF  ANALYSIS    OF    VARIANCE 


Source 


df 


Mean    Square 


VL-NS 

1 

MR-NI 

1 

VL-NS 

x  MR-NI 

1 

Error 

between 

72 

Rates 

3 

VL-NS 

x  Rates 

3 

MR-NI 

x  Rates 

3 

VL-NS 

x  MR-NI  x  Rates 

3 

Error 

w  i  th  i  n 

216 

289.27 

7.13 

1139.39 

28.  10 

21  .20 

0.52 

40.55 

2067.97 

236.34 

139.69 

15.96 

159.55 

18.23 

61.42 

7.02 

8.75 

25 


Level  of 

Vi  sua  1  Acu  i  ty 


TABLE  6 

TWO-WEEK  RETENTION 

DESCRIPTIVE  STATISTICS 


Level  of 

I nte 1 1 i  gence 


Rate  of  Presentation 


75     210     245     280 


Vi  sua  1 1 y 

■ 

^lenta  1  1  y 
Retarded 

14 

X 
s 

8.50 
5.50 

7.07 
4.37 

8.29 
5.57 

6.77 
4.93 

Limi  ted 

Norma  1 

1 nte 1 1 i  gence 

24 

X 

s 

12.58 
7.56 

12.25 
6.54 

11  .08 
8.68 

10.20 
7.47 

Norma  1 ly 

Menta  1  ly 
Retarded 

\k 

X 
s 

3.14 
1  .23 

7.00 
6.36 

6.79 
4.98 

4.86 
4.58 

S  ighted 

Norma  1 

1 nte 1 1 i  gence 

24 

X 

s 

8.92 

8.11 

10.67 
6.99 

7.58 
7.14 

7.71 
6.93 

SUMMARY  OF  ANALYSIS  OF  VARIANCE 


Source 


df 


Mean  Square 


VL-NS 

MR-NI 

VL-NS  x  MR-NI 

Error  between 


1 

1 
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393.81 

6.11 
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10.83 

5.92 

0.09 
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26.24 
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2.71 
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2.42 
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0.80 

24.84 
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MR-NI  x  Rates  3 
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Error  within  216 


CHAPTER  XVI  I 
The  Intelligibility  Of  Time-Compressed  Speech 

H  .  Les  1  ie  Cramer'-'' 

T  ime  Compress  i  on  Process 

If  we  make  a  magnetic  tape  recording  of  speech  at  7  1/2  inches  per  second 
(ips.)  and  play  it  at  15  i  ps . ,  it  can  be  heard  in  half  of  the  original  time  in 
which  it  was  recorded.   All  the  frequencies  double,  however,  giving  a  high  pitched, 
so-called  "Donald  Duck"  effect.   Speech  so  processed  and  played  back  is  rather 
unpleasant  and  practically  unintelligible.   With  the  development  of  magnetic 
tape  recording,  it  has  become  possible  to  halve  play  back  time  by  another  method 
without  the  attendant  frequency  shift  of  the  speeded-up  play  back.   This  other 
method  involves  cutting  a  tape  recording  every  one  quarter  of  an  inch  from  beginning 
to  end,  and  discarding  alternate  pieces.   The  remaining  pieces  are  then  spliced 
together  to  make  a  reconstituted  tape  that  is  half  the  length  of  the  tape  as 
originally  recorded.   The  play  back  of  a  tape  processed  by  this  chop-splice  method, 
will  sound  normal  however,  as  far  as  the  pitch  of  the  speaker's  voice  is  concerned, 
although  it  will  obviously  sound  as  though  words  were  spoken  rapidly. 

Garvey  and  Henneman  (1950)  investigated  word  intelligibility  of  speeded- 
speech  produced  by  this  chop-splice  method.   Garvey  (1953)  reported  that  the 
intelligibility  remains  high,  above  90  per  cent,  at  a  word  per  minute  rate  2  and 
1/2  times  the  original  recording.   Garvey  further  reports  (1965)  that  after 
completing  his  thesis  with  the  chop-splice  method  that  he  never  wanted  to  see  another 
tape  or  splicer;  and  it  is  fortunate  for  researchers  that  Fairbanks,  Everitt,  and 
Jaeger  (1953,  195^,  1959)  developed  an  electro-mechanical  system  for  automatically 
discarding  segments  of  speech  from  a  recorded  tape.   Their  system  uses  four  playback 
heads  mounted  in  a  rotating  drum  to  scan  a  magnetically  recorded  tape.   The  effective 
time  length  of  each  speech  sample  scanned  and  retained  is  equivalent  to  the  speech 
time  on  each  pjece  of  tape  spliced  together  in  the  chop-splice  method,  and  is  called 
the  sampling  interval.   This  sampling  interval  is  determined  by  the  revolutions 
per  minute  of  the  rotating  head  assembly.   The  time  length  of  each  speech  sample 
eliminated  is  likewise  similar  to  the  speech  time  on  each  piece  of  tape  discarded 
in  the  chop-splice  method,  and  is  called  the  discard  interval.   The  discard  interval 
is  determined  by  the  speed  of  the  magnetic  recording  tape  around  the  rotating  head 
assembly.   The  output  of  Fairbanks'  system  is  thus  equivalent  to  the  output  produced 
by  the  chop-splice  method. 

I  nte 1 1 i  g  i  b  i 1 i  ty  Measurement 

Previous  studies  dealing  with  the  intelligibility  of  time-compressed  speech 
have  used  phonetically  balanced  spondaic-word  lists  (lists  of  two  syllable  words 
with  equal  stress  on  each  syllable,  eg.  horseshoe).   (Fairbanks  and  Kodman  1957)- 
A  restricted  list  of  fifty  words  is  presented  after  time  has  been  allowed  for  the 
listener  to  familiarize  himself  with  the  words  in  the  list,  both  by  studying  the 
written  lists,  and  by  hearing  the  list  in  order  while  reading  them.   The  lists 
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are    then  presented  many  times  in  different  randomized  orders.   There  is  no  known 
previous  investigation  of  the  intelligibility  of  time-compressed  words  in  context. 

Fairbanks,  Guttman,  and  Miron  (1957)  investigated  comprehension  of  compressed 
speech  by  using  1500  word  technical  passages  presented  at  high  word-per-mi nute  (wpm) 
rates,"  tested  by  multiple  choice  type  questions.   However,  it  was  not  clear  which 
of  three  possible  causes  might  be  attributed  to  a  wrong  answer;   (1)   the  subject 
did  not  hear  every  speech  sound  because  the  length  of  the  discarded  sample  was  so 
long  that  whole  sounds  were  dropped;   (2)   the  distortion  introduced  by  the 
interruption  frequency  of  the  compression  equipment  was  interfering  with  the  signal; 
or   (3)   there  was  a  problem  in  perceiving  at  a  rapid  rate  (i.e.,  difficulty  in 
cognitive  processing).   Unfortunately,  this  work  was  done  only  with  a  discard 
interval  of  .02  seconds.   The  only  speech  compression  apparatus  commercially 
available  is  the  Eltro  Information  Rate  Changer  manufactured  by  Telefonbau  and 
Norma  1 ze  i  t ,  Frankf urt-am-Ma in ,  Germany.   This  equipment  uses  a  .04  second  discard 
interval  at  all  compression  ratios,  and  cannot  be  altered.   Tests  run  to  determine 
the  comprehension  of  speeded  speech  by  both  b 1 i nd  pupils  (Bixler,  Fou 1 ke ,  Amster, 
Er   Nolan,  1961,  Fou  1  ke  and  Bixler,  1 963  &  1964,  Fou  1  ke ,  Amster,  Nolan,  &  Bixler, 
1962)  and  sighted  pupils  (Orr  and  Friedman,  1964,  Friedman,  Orr,  Freedle,  and 
Norris,  1966,  Friedman,  Orr,  and  Norris,  1 966 ,  Voor  ,  1962,  Wood,  1 965)  used  this 
equipment  for  compressing  their  materials. 

Fairbanks  and  Kodman  (1957)  tested  word  intelligibility  as  a  function 
of  time  compression.   Their  curves  suggest  that  the  optimum  rate  of  interruption 
and  length  of  discard  interval  is  not  a  constant  for  all  compression  ratios.   At 
compression  rates  up  to  75%,  a  discard  interval  of  .01  sec.  appears  best;  at  80 
to  85%,  '06  sec.,  but  at  90%,   05  sec.   When  listening  to  connected  discourse,  a 
person  has  cues  to  the  words  from  both  the  context  and  the  grammar  of  the  sentences 
(Miller,  Heise  and  Lichten  1951,  Goldman-E i s 1 er ,  1958  and  I96I,  Pollack  1964,  Miller 
1962,  and  Savin  1 963 • ) 

The  research  reported  here  attempts  to  test  the  middle  ground  between 
word  intelligibility  of  words  in  isolation  and  comprehension  of  long  passages  by 
testing  the  word  intelligibility  of  short  sentences.   This  research  was  done  in 
two  parts,  the  first  being  a  pilot  study  to  see  if  a  difference  of  15  to  30 
milliseconds  (ms . )  in  the  time  of  presentation  of  sentences  to  the  two  ears  would 
improve  the  intelligibility  of  time-compressed  speech.   The  second  part  was  the 
main  study  and  deals  with  the  intelligibility  of  the  same  passages  used  in  the 
pilot  study.   These  were  presented  at    seven  different  compression  ratios,  each 
processed  with  seven  different  discard  intervals.   The  passages  used  were  from 
the  Harvard  Psycho-Acoustic  Laboratory  (P. A  L.)  Auditory  test  no.  12,  (Hudgins, 
et  al.,  1947)  which  consisted  of  seven  passages  of  28  questions  each.   The  P.A.L. 
Test  passages  were  preceded  by  the  foHowing  introductory  passage  which  was  used 
to  help  motivate  the  students  and  also  to  allow  a  gradual  increase  of  speech  rate 
to  approximately  the  starting  rate   of  the  lists.   Here  is  the  introductory  passage 
and  the  first  ten  sentences. 

This  is  an  experiment  to  determine  the  intelligibility  of  speeded 
speech.   You  will  hear  a  passage,  which  has  been  specially  processed, 
in  less  time  than  it  took  the  reader  to  read  it.   Even  these  instructions 
to  which  you  are  now  listening  have  been  speeded  by  20%  as  compared 
with  the  original    The  reader's  voice  sounds  normal,  however,  rather 
than  high-pitched,  as  it  would  if  a  phonograph  record  of  it  were  being 
played  at  a  higher  speed  than  that  at  which  it  was  recorded.   Speech 
processed  in  this  manner  is  already  being  used  to  present  material  to 


"Since  the  average  length  of  words  may  vary,  Carroll  (1964)  proposes 
that  the  sy 1  lab  1 e-per-mi nute  rate  would  perhaps  be  a  better  measure,  with  the 
speed  of  265  sy 1 lables-per-minute  as  the  average  rate  of  normal  speech  production. 
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blind  persons  at  rates  up  to  kJ5    w.p.m.   A  normal  speaking  range 
varies  from  a  low  125  w.p.m.  to  a  rapid  200  w.p.m.  or  more.   The 
blind  high  school  student  averages  only  90  w.p.m.  when  reading  braille, 
while  the  average  sighted  high  school  student  reads  books  at  200 
w.p.m.   Some  blind  people  using  both  hands  simultaneously  to  read 
braille  can  reach  reading  speeds  as  high  as  225  w.p.m.   However, 
this  is  exceptional,  and  less  than  10%  of  the  blind  learn  to  read 
braille  at  all.   This  experiment  is  designed  to  test  the  intelligibility 
of  compressed  speech  at  rates  up  to  800  w.p.m.,  increasing  by 
increments  of  50  words  per  minute  over  a  series  of  seven  passages.   It 
is  contemplated  that  the  results  of  this  study  will  lead  to  improved 
methods  of  presenting  verbal  material  to  the  blind.   It  also  seems 
that  such  rapid  speech  can  be  useful  in  presenting  tape  recorded 
reviews  of  lectures  at  rates  so  high  that  the  50  minute  lecture  can 
be  heard  in  as  little  as  10  to  12  minutes. 

The  following  passages  are  from  the  Psycho-Acoustic  Laboratory  Test 
Number  12,  published  by  Harvard  University. 

List  Number  1 

1.  In  what  country  is  Chicago? 

2.  What  letter  comes  after  Y? 

3.  What  is  the  color  of  grass? 
k .      What  number  comes  before  7? 

5.  What  part  of  the  body  do  you  write  with? 

6.  What  comes  out  of  a  kitchen  faucet? 

7.  What  number  is  between  8  and  10? 

8.  Which  is  wetter,  water  or  sand? 

9-   Do  you  dig  holes  with  a  shovel  or  a  rake? 
10.   What  is  the  color  of  a  lemon? 

For  the  pilot  study  this  tape  was  presented  by  means  of  earphones  to  kS 
Radcliffe  students,  who  heard  it  from  seven  different  tapes,  each  in  the  same 
ascending  order  of  compression,  starting  at  50%  compression,  {k$k   syllables  per 
minute  or)  398  words  per  minute,  with  a  1.24  syllable-to-word  ratio.   The  subjects 
were  screened  to  eliminate  any  who  had  more  than  5  decibels  hearing  loss  in  either 
ear  or  more  than  3  decibels  difference  between  ears  using  the  Central  Institute 
for  the  Deaf  (CID)  Auditory  Test  W-2.   (Davis,  et_  a_j_.  ,  1964,  pg.  535) 

This  pilot  study  which  preceded  the  main  study  was  performed  to  determine 
whether  there  was  any  difference  in  intelligibility  when  the  speech  was  delayed  at 
one  ear    by  seven  different  amounts  including  zero  delay.   Table  1  shows  the  design 
for  this  study.   Table  2  shows  the  amounts  of  delay  which  were  obtained  by  adjustment 
of  a  micrometer  head.   A  playback  head  was  attached  to  the  micrometer  so  that  the 
head  could  be  moved  along  the  tape  on  which  the  passages  were  recorded.   The  micrometer 
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TABLE 


Latin  Square  design  for  the  Pilot  Study  showing  distribution  of  passages  from 
the  P. A  L.  Auditory  Test  No.  12.   Each  subject  heard  7  passages  as  shown  in 
each  row,  in  ascending  order  of  compression  ratios.   All  passages  were  compressed 
at  .035  sec.  discard  interval.' 


Speed-up  Factor 

2.0 

2.33 

2.67 

3.0 

3-33 

3.67 

4.0 

Compress  ion  Rat  i  o 

50.00 

57.15 

62.40 

66.60 

70.00 

72.75 

75.00  i 

Sy 1 1 ab 1 es/m  i  n . 

494 

576 

658 

740 

822 

905 

987 

^ords/mi  n . 

398 

464 

531 

597 

663 

730 

796 

Tape    Subjects 
Numbers    Numbered 
1         01-07 

2 

3 

4 

5 

6 

7 

8 

2        08-14 

4 

8 

7 

2 

5 

3 

6 

3       15-21 

8 

2 

6 

3 

7 

5 

4 

4        22-28 

7 

6 

3 

4 

2 

8 

5 

5       29-35 

5 

7 

2 

6 

8 

4 

3 

6       36-42 

3 

5 

8 

7 

4 

6 

2 

7       43-49 

6 

4 

5 

8 

3 

2 

7 

Each  Tape  (numbered  1-7)  was  played  7  times.   The  first  subject  hearing  each  tape 
had  the  same  material  presented  at  the  same  time  in  each  ear.   Each  of  the  six  sul 
sequent  subjects  to  hear  each  tape  had  the  material  alternately  de'ayed  at  one 
ear  (alternately  right  and  left  on  every  question).   The  seven  amounts  of  delay, 
including  zero  delay,  were  as  follows: 


TABLE  2 


Amount  of  delay  for  each  of  the 

7  cond  i  t  i  ons  i  r 

the  Pi  1 

ot  Study 

Delay  Condition  No. 
&   Subjects  Number 
for  Table  number  1 

1 

2 

3 

4 

5 

6 

7 

De lay  in  Mi  1  1  i  - 
seconds 

0.0 

0.5 

1  .0 

4.0 

7.5 

15-0 

30.0 

Micrometer  Head 
Sett  i  nq  in  i  nches 

0 

.00375 

.00750 

.03000 

.05625 

.  1  1250 

.22500 

deader  averaged  199  words/min.  or  247  sy 1 1 ab 1 es/m i n .  across  7  lists 
for  a  syllable-to-word  ratio  of  1.24. 
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was  calibrated  in  10  thousandths  of  an  inch  to  allow  precise  settings.   A 
special  channel  switch  was  constructed  which  was  activated  by  a  voice  key  so  that 
the  delayed  presentation  was  automatically  changed  from  the  right  to  the  left 
ear  for  every  other  sentence.   Figure  1  is  the  block  diagram  of  the  instrumentation 
used  for  presentation  of  the  compressed  speech  to  the  students- 
It  had  been  originally  hypothesized  that  a  delay  in  one  direction  might 
be  helpful  for  some  students,  while  the cppos i te  direction  might  be  detrimental. 
It  was  thought  that  the  direction  of  favorable  delay  if  found,  might  have  a 
relationship  to  the  handedness  of  the  individual,  as  the  speech  center  in  the 
brain  is  generally  in  the  opposite  cerebral  hemisphere  from  the  motoi  control 
center  (Kimura,  1 96 1  a ,  1 96 1 b ,  1963,  1964).   The  results  showed  no  significant 
differences  in  intelligibility  as  measured  by  the  P.A.L.  Test  between  even  and 
odd  items  delayed  to  the  left  and  right  ears  respectively.   This  negative  finding 
does  not,  of  course,  preclude  the  possibility  that  there  may  be  differences  not 
detectable  with  this  technique.   Figure  2  shows  the  results  of  the  pilot  study 
for  the  5th  delay  condition  of  7  1/2  milliseconds  (ms . )  which  was  the  major  effect 
found  (Cramer,  1965).   All  of  the  amounts  of  delay  other  than  30  ms .  (which  produced 
speech  less  intelligible  than  that  produced  by  no  delay)  showed  a  slight  increase 
in  intelligibility  over  the  no  delay  condition.   It  should  be  obvious  that  when 
the  intelligibility  is  near  100%,  little  can  be  done  to  improve  it.   In  other  words, 
speech  has  to  be  somewhat  unintelligible  in  order  for  an  inter-aural  (difference 
in  time  of)  presentation  to  facilitate  an  improvement. 

This  writer  has  named  this  increase  in  intelligibility  produced  by  delaying 
the  speech  at  one  ear  the  binaural  redundancy  effect.   Figure  3  shows  this  effect 
when  plotted  as  a  curve.   The  sharper  drop  of  the  curve  at  7  1/2  ms .  than  at 
15  ms .  raises  the  speculation  that  the  maximum  effect  would  occur  around  10  ms . 
as  shown  by  the  dotted  line.   This  happens  to  be  the  average  pitch  period  of  the 
voice  of  Mr.  Paul  Clark,  the  reader  of  the  passages.   The  binaural  redundancy 
effect  suggests  that  the  brain  of  a  listener  is  able  to  correlate  the  inputs  to 
his  ears.   Figure  4  shows  an  oscillograph  tracing  of  a  vowel  sound.   It  will  be 
noted  that  successive  pitch  periods  are    highly  similar.   When  we  displace  one 
tracing  on  the  other  by  the  amount  of  one  pitch  period  (Figure  5)  the  match  is 
exce 1  lent. 

Incidentally,  it  has  come  to  our  attention  that  the  Air  Force  has  contracted 
with  Melpar,  Inc.  in  Falls  Church,  Virginia,  to  test  the  effect  of  delaying  speech 
to  one  ear    to  improve  the  intelligibility  of  normal  speech  in  the  presence  of 
noise  (Reddinger  1 966) .   It  was  found  that  a  7  1/2  ms .  inter-aural  difference 
in  time  of  presentation  of  a  voice  signal  allowed  the  introduction  of  almost 
twice  as  much  noise  as  normally  present  with  no  loss  in  intelligibility. 

In  the  main  study,  the  same  passages  were  read  by  the  same  speaker,  with 
the  same  introductory  passage.   However,  since  speech  became  seriously  distorted 
only  at  the  75%  compression  ratio  (equal  to  4  times  normal  or  987  syllables  per 
minute  or  796  words  per  minute).   It  was  decided  to  drop  the  50%  compression  ratio 
(twice  normal  speed,  499  syllables  per  minute  or  398  words  per  minute)  from  the 
study  and  start  with  a  compression  ratio  of  57-19  (2.33  times  normal  speed,  576 
syllables  per  minute,  or  464  words  per  minute).   The  next  rate  was  then  omitted 
as  used  in  the  pilot  study,  namely,  62.4%  compression  (2.67  times  normal  speed, 
658  syllables  per  minute  or  531  words  per  minute).   This  allowed  the  addition  of 
two  higher  rates  at  the  end  of  the  series,  beyond  the  compres  ion  ratio  of  75%, 
(4  times  normal  speed)  used  in  the  pilot  study.   Each  of  the  seven  passages  at 
seven  compression  ratios  was  then  processed  at  seven  different  discard  intervals 
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from  10  to  95  ms .   The  design  for  this  part  of  the  study  is  shown  in  Tables  3 
and  4. 

This  design  was  used  to  determine  the  optimal  discard  interval  for 
maximal  intelligibility  of  speech  at  each  of  the  seven  rates  shown  in  table 
number  3-   Also  to  be  tested  was  whether  a  10  ms .  inter-aural  time  difference 
of  presentation  would  significantly  improve  intelligibility.   A  third  condition, 
the  use  of  a  loudspeaker  in  a  large  reverberant  classroom  was  also  to  be  tested. 

Voor  (1962)  expressed  concern  that  the  rapidity  and  short  duration  of 
each  speech  sound  in  highly  compressed  speech  could  cause  a  masking  by  reverberation 
when  such  sounds  are  presented  by  means  of  a  loudspeaker  in  an  ordinary  classroom. 
To  avoid  this  possibility,  he  presented  all  his  material  by  means  of  earphones. 
Orr  and  Friedman  (1964a,  1964b,  1964c),  rriedman,  Orr,  Freedle,  &  Norris,  (I966), 
Friedman,  Orr,  &  Norris,  (I966)  used  loudspeaker  presentation,  however   because 
they  felt  it  represents  a  closer  approximation  to  what  is  practical  in  the 
typical  classroom.   The  equipment  was  set  up  as  shown  in  Figure  6. 

In  the  pilot  study,  the  answers  given  by  each  student  were  scored  as 
either  right  or  wrong  for  each  question  by  the  correspondence  to  the  correct 
answers  as  published  with  the  P.A.L.  Test  Number  12.   There  were  a  few  modifications 
necessary,  for  instance  the  question,  "How  many  states  are    there  in  the  Union?" 
is  now  correctly  answered  by  50  rather  than  48,  and  the  color  of  the  cloth  on  a 
pool  table  is  not  always  green  today,  decorators  having  entered  the  field    in 
one  case  the  question,  "Which  is  larger,  a  dog  or  a  horse?"  was  misread  "house", 
so  that  house  had  to  be  allowed  as  a  correct  answer.   There  were  several  questions, 
however,  where  an  answer  could  be  given  without  assurance  that  the  question  was 
correctly  perceived.   Such  an  example  is  "What  do  you  use  to  unlock  a  door?" 
The  answers  included  key,  knob,  and  hand.   The  latter  answer  could  also  have 
been  given  if  the  subject  thought  the  question  was,  "What  do  you  use  to  knock 
a  door?"  (Asher  ,  1958)    Because  of  such  uncertainties,  in  this  final  part  of 
the  study,  subjects  were  asked  to  write  each  utterance  as  they  heard  it,  rather 
than  answering  the  question  as  done  in  the  pilot  study   Their  responses  were 
then  scored  on  a  word-for-word  basis.   At  first  a  sy 1 1 ab 1 e-f or-sy 1  lab  I e  scoring 
basis  was  proposed.   When  we  found  responses  such  as  "Which  is  Dr   Michael  Day?" 
for  the  question  "Which  is  darker,  night  or  day?",  "Does  California  grow  the 
orange?"  for  the  question,  "Does  a  cow  have  kittens  or  horns?"  and  "What  is 
a  Catholic  worth?"  for  "What  does  a  cat  lick  with?",  it  became  obvious  that 
such  scoring  would  give  spuriously  high  scores,  even  when  the  response  bore 
little  relation  to  the  original  in  meaning.   We  did  allow  plurals  for  singulars 
and  "come"  for  "comes"  and  vice  versa. 

The  additional  time  required  for  subjects  to  write  what  was  heard  would 
have  lengthened  the  test  time  to  more  than  one  and  one  half  hours;  therefore,  each 
list  was  cut  in  half  arbitrarily.   When  the  sheets  were  all  collected  and  scored, 
the  data  were  first  plotted  by  discard  interval    It  appeared  that  something 
peculiar  was  occuring  when  the  curves  were  plotted  with  intelligibility  versus 
discard  interval  (as  in  Figures  7,  8,  &  3).       Because  of  the  shapes  of  the  tails 
of  the  curves  and  the  shift  of  the  optimum  points  to  the  right  as  the  speed 
increases  the  intelligibility  was  then  plotted  as  a  function  of  the  sampling 
interval.   When  the  raw  data  were  fitted  by  a  least  squares  solution  for  a  cubic 
equation,  we  obtained  the  smoothed  curves  shown  in  each  case  (Figures  10,  11, 
and  12),   The  broken  line  shows  the  raw  data. 

It  will  be  observed  that  the  optimum  points  (highest  intelligibility  point 
on  each  curve)  indicated  by  arrow  heads  lie  nearly  in  a  straight  line  from  top  to 
bottom,  showing  that  the  optimum  is  a  function  of  the  sampling  interval  rather  than 
the  discard  interval.   Fairbanks  was  a  phonetician  and  was  concerned  with  the 
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TABLE  3 


Latin  Square  Design  for  the  Main  Study  showing  order  of  P.A.L.  Passages  2-8 
Renumbered  1-7  respectively  for  this  study.  Blocks  B  to  G  were  arranged  by 
permuting   the  numbers  in  each  cell  as  designated  by  Winer  (1962). 

Master  A--Discard  Interval  10  Milliseconds 


List  numbers  as  announcec 
s  tudents 

on  the 

tape  heard  by 

J 

1         2 

1 
3   1   4 

5 

6 

7 

Compress  ion 
Ratio 

57.  10     66.60 

70.00 

72.70 

75.00 

76.90 

■ 

78.57 

Sy 1 1 ab 1 es/m  i  n  . 
Rate 

576      740 

822 

905 

987 

1069 

1  152 

Words/min,  Rate 

464 

597 

663 

730 

796 

862 

928 

Speed-up  Factor 

2.33 

3.00 

3-33 

3-67 

4.00 

4.33 

4.67 

Tape 

Number 

A-l 

Subjects 
Numbered 

01-07 

6 

7 

1 

2 

3 

<* 

5 

A-2      08-14 

1 

5 

4 

6 

2 

7 

3 

A-3      15-21 

5 

6 

3 

7 

<* 

2 

1 

A-4     22-28 

4 

3 

7 

1 

6 

5 

2 

A-5 

29-35 

2 

4 

6 

3 

5 

1       7 

A-6 

36-42 

7 

2 

5 

4 

1 

3 

6 

A-7 

1  43-49 

3 

1 

2 

5 

7      6 

«. 

The  above  numbers  are    P.A.L.  Auditory  Test  #12,  passages 
2-8,  renumbered  1-7  respectively. 

TABLE  4 


Tape  Series  A  to  G  Showing  Discard  Interval  for  Each 


Tape  Series 
Pes  i  gnat  i  on 


D  i  scard  I  nterva 
in  M  i 1 1 i  second  s 


20 


35 


50 


65 


80 


95 


The  passage  numbers  for  each  subsequent  block  after  A  as  shown  here  were 

cyclically  permuted  by  increasing  the  passage  numbers  in  block  A  by  1 ,  except 

that  number  7  became  1  as  each  was  changed  to  render  the  next  block  in 
a  1 phabet  i  c  order . 
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Figure   6.      Block   Diagram   of    Playback  Apparatus    for 
Ma  i  n    Study 
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redundancy  of  phonemes.   He  was  interested  in  seeing  how  much  could  be  discarded 
and  still  have  intelligible  speech,  and  therefore  he  plotted  his  curves  as  a 
function  of  the  discard  interval.   From  a  perceptual  point  of  view,  however,  it 
should  be  obvious  that  the  listener  has  no  knowledge  of  what  has  been  deleted, 
and  can  only  perceive  what  is  present.   When  the  optimum  sampling  interval  at 
each  compression  ratio  is  examined,  it  can  be  observed  that  approximately  15  tns. 
is  the  optimum  sampling  interval  for  every  compression  ratio   This,  incidentally, 
is  what  is  required  to  insure  that,  on  the  average,  there  is  at  least  one  complete 
pitch  period  of  the  voice  in  each  sample  of  speech. 

In  veiw  of  all  the  work  being  done  with  The  Harmonic  Compressor  (Golden, 
1966)  and  the  Haskins  Pattern  play  back  equipment  (Cooper  e_t_  aj_.  ,  1951  &  1952, 
Delattre  et  al . ,  1956,  Liberman  et  al  .  ,  1957)where  pitch  periods  are    not  considered, 
it  seemed  somewhat  puzzling  that  the  pitch  period  appeared  so  important.   At 
first,  it  seemed  that  this  might  be  a  coincidence.   However,  the  delay  to  one  ear 
did  seem  related  to  pitch  periods,  and  it  is  fairly  easy  to  understand  the  rationale 
for  this. 

During  the  past  summer,  some  very  short  sampling  intervals  less  than 
a  pitch  period,  were  derived  by  means  of  the  Adage  Amb i 1 og  200-Hybrid  Analog- 
Digital  computer,  (Grandine  and  Hagan,  1965) •   With  Fairbanks'  compressor,  it  is 
not  possible  to  get  samples  shorter  than  a  pitch  period  except  at  very  short 
discard  intervals  of  10  ms .  and  at  rates  above  50%  compression.   For  such  conditions 
Fairbanks  &  Kodman  (1957)  complained  that  the  frequency  of  interruption  obtruded 
on  the  first  formant  of  the  speech  frequencies    From  our  work,  it  appears  that 
samples  of  a  half  a  pitch  period  will  simply  double  the  pitch  of  the  voice  so 
processed,  just  as  one  can  do  by  playing  a  recording  of  a  voice  at  twice  normal 
speed.   All  these  findings  tend  to  indicate  that  speech  is  processed  mentally 
over  theperiod  of  a  pitch  period,  and  the  pitch  that  is  perceived  is  determined 
by  the  length  of  time  between  fundamental  cycles  irrespective  of  half  periods 
added  to  a  fundamental.   A  soprano  voice  will  have  almost  twice  the  number  of 
pitch  periods  in  a  given  time  as  an  average  male.   This  has  led  telephone  engineers 
to  comment  that  women  convey  half  as  much  information  in  a  given  time  as  men, 
because  their  voices  are    twice  as  redundant.   However,  short  wave  radio  stations 
capitalize  on  this,  because  a  woman's  voice  has  twice  as  many  chances  (twice  as 
many  pitch  periods)  to  get  through  static  intact  as  a  man's  voice.   In  view  of 
this  phenomenon  it  would  seem  that  a  woman's  voice  might  be  expected  to  be  more 
intelligible  when  compressed  at  high  ratios,  because  twice  as  many  pitch  periods 
can  be  discarded,  leaving  an  equal  number  to  be  perceived  as  with  the  average 
male  voice.   Comparisons  made  by  Fou  1  ke ,  et_  aj_.  ,  (1962)  and  others  have  found 
that  the  speaker  you  have  heard  on  our  tape  was  the  most  comprehensible,  when 
compressed,  of  a  number  of  speakers  including  both  men  and  women.   Our  research 
suggests  at  least  two  reasons  for  this. 

When  the  pitch  period  record  of  the  voice  of  our  speaker  was  examined 
it  was  found  that  in  twenty-five  vowels  sampled,  his  fundamental  frequency  never 
went  below  95  Hertz"''  (cycles  per  second)  nor  above  105  Hertz.   This  is  +  5% 
variation.   A  similar  study  of  Vice-President  Humphrey's  voice,  however,  gives  a 
range  from  a  low  of  112  Hertz  to  a  high  of  1 87  Hertz,  +  25%.   His  voice  does  not 
compress  we  1 1 . 

The  women  who  read  for  the  American  Printing  House  for  the  Blind  are    quite 
dramatic  in  their  voicing,  and  consequently  have  a  very  large  dynamic  pitch  range. 


"Hertz  is  the  new  designation  for  cycles  per  second  proposed  and  used 
by  the  American  Institute  of  Physics. 


1^5 

These  are  the  readers  used  by  Fou  1  ke  e_t^  aj_.  ,  (1962)  and  others.   For  the  rotating 
head  type  of  compression,  using  a  periodic  discard,  it  would  seem  that  a  woman 
with  a  high-pitched  voice  but  limited  dynamic  range  would  give  superior  results. 

Earlier  it  was  mentioned  that  this  research  has  suggested  two  reasons 
why  previous  research  has  found  a  low-pitched  man's  voice  superior  to  any  of 
several  women's  voices.   The  first  reason  was  that  no  known  study  has  included 
the  voice  of  a  woman  with  a  narrow  dynamic  pitch  range.   The  second  reason  is 
that  the  current  system  for  compressing  speech,  namely  the  rotating  head  assembly 
of  the  Tempo  Regulator,  has  a  fixed  discard  interval  of  kO   ms . ,  regardless  of  the 
compression  ratio,   This  means  that  at  50%  compression,  or  2  times  the  normal 
rate  the  sampling  interval  is  likewise  kO   ms . ,  almost  3  times  the  optimum  for 
a  low-pitched  man's  voice,  but  seven  to  nine  times  the  optimum  for  a  high-pitched 
woman's  voice.   What  is  needed  is  a  compressor  that  has  the  features  of  automatically 
adjusting  the  sampling  time  length  to  the  optimum  sampling  length  of  the  voice 
being  processed.   Rather  than  a  strictly  periodic  device,  what  seems  needed  is 
a  pitch-synchronous  model  such  as  those  that  recently  patented  by  Denis  Gabor 
(1950)  (1965).   His  speech  compressor  has  8  different  sampling  intervals  within 
an  octave,  with  automatic  switching  to  match  the  optimum  sample  length.   With 
such  equipment  it  should  be  possible  to  achieve  improved  intelligibility  at  higher 
compression  ratios  than  is  now  possible,  and  consequently,  a  higher  percentage 
of  comprehension.   Figure  13  shows  the  sampling  interval  of  the  tempo  regulator 
at  each  speed-up  ratio  from  10%  to  50%  compression  ratio. 

The  data  were  plotted  from  the  optimum  points,  indicated  by  arrow  heads, 
in  Figures  11  and  12  as  intelligibility  versus  word-per-m i nute  rate  in  Figure  14. 

Superimposing  Figure  2  from  the  Pilot  Study  on  Figure  \k   of  the  main 
study  renders  Figure  15-   The  differences  in  the  intelligibility  of  the  two  studies 
reflect  the  lower  intelligibility  scores  when  the  P.A.L.  is  scored  by  a  word-for- 
word  basis  rather  than  by  correct  answers. 

In  concluding  it  should  be  made  clear  that  the  intelligibility  at  the 
high  word  rates  reported  in  this  research  does  not  by  any  means  infer  that  comprehension 
of  long  connected  continuous  discourse  can  also  be  obtained  at  comparable  rates. 
Continuous  passages  have  also  been  processed  at  the  optimum  rates  for  maximum 
intelligibility,  and  at  about  2.5  times  normal  (70%  compression)  they  become 
meaningless,  even  though  some  words  are    intelligible.   This  is  probably  due  in 
part  to  lack  of  time  for  cognitive  processing.   !t  is  also  due  to  a  "double- 
take"  effect;  that  is,  one  word  is  transformed  into  another,  for  example,  the 
word  "thriving"  is  heard  as  "fighting"  at  2.5  times  normal  speed.   This  appears 
due  to  the  /th/  being  confused  for  an  /f/  when  the  /r/  in  thriving  is  clipped 
short,  and  the  /v/,  when  clipped  short,  sounding  like  a  /t/.   !n  a  long  continuous 
passage,  such  a  confusion  can  interfere  with  the  comprehension  of  subsequent 
sentences  presented  at  a  rate  two  to  three  times  normal.   It  would  seem  that 
such  a  confusion  is  analogous  to  the  confusion  resulting  from  losing  one's  place 
while  watching  a  rapidly  paced  reading  film. 
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CHAPTER  XIX 
SUMMARY  Er   CONCLUSIONS 

On  the  final  morning  of  the  conference,  the  chairmen  of  the  preceeding 
day's  discussion  groups  summarized,  for  the  entire  conference  membership,  the 
deliberations  of  their  groups.   Since  each  group  was  given  a  free  hand  in 
choosing  topics  to  be  discussed,  a  good  deal  of  common  ground  was  covered. 
For  this  reason,  no  effort  has  been  made  to  reproduce  an  exact  transcript 
of  each  chairman's  summary.   Instead,  the  summaries  have  been  combined  to 
produce  a  single  list  of  recommendations. 

An  Economic  Source  For  Time 
Compressed  or  Expanded  Speech 

Perhaps  the  most  frequent  and  most  urgent  recommendations  made  by 
conference  members  was  for  the  establishment  of  an  adequate  source  of  supply  for 
time  compressed  or  expanded  recorded  speech.   It  was  felt  that  further  progress 
in  developing  applications  of  time  compressed  or  expanded  speech  depends  upon 
the  organization  of  a  center  (or  centers),  capable  of  supplying  rate  controlled 
speech  of  high  quality,  in  sufficient  quantity  to  meet  the  needs  of  those  who 
would  use  it,  and  at  a  low  enough  price  to  make  its  use  economically  feasible. 
It  was  pointed  out  that,  as  matters  presently  stand,  it  is  not  possible  to 
make  realistic  plans  for  the  incorporation  of  time  compressed  recorded  speech 
into  the  educational  process,  even  for  purposes  of  demonstration.   Current 
costs  would  be  prohibitive,  and  existing  sources  could  not  meet  the  demand 
for  a  large  quantity  of  material  over  a  long  period  of  time  that  would  be 
requ  i  red  . 

Needed  Research 

Conference  members  recognized  an  urgent  need  for  further  research,  of 
both  psychoeducat i ona  1  and  technological  import.   Many  problems,  resolvable  by 
research,  were  mentioned.   Although  it  will  not  be  possible  to  provide  a  thorough 
statement  of  each  problem  here,  an  effort  will  be  made  to  summarize  them  in  a 
general  way  in  the  belief  that  such  a  summary  may  be  useful  to  those  interested 
in  research . 

The  present  state  of  ignorance  regarding  the  nature  of  listening  tasks, 
and  of  training  methods  for  promoting  effective  listening,  was  felt  to  be  a 
problem  of  central  importance.   It  was  pointed  out  that,  because  so  little  is 
known  about  listening  of  any  kind,  it  would  be  a  mistake  to  confine  our  research 
interests  to  listening  tasks  in  which  the  speech  has  been  accelerated.   Much  of 
what  is  learned  about  the  development  of  listening  skills  may  be  applicable  regardless 
of  the  word  rate.   Presenting  information  at  an  accelerated  word  rate  may  complicate 
the  listening  task,  but  the  impact  of  accelerated  speech  upon  the  perceptual  and 
cognitive  operations  employed  by  the  individual  engaged  in  a  listening  task, 
cannot  be  ascertained  until  these  perceptual  and  cognitive  operations  are,    themselves, 
more  clearly  understood.   With  such  understanding  ,  the  constitution  of  training 
experiences  could  be  guided  by  more  rational  and  less  purely  imperical  considerations. 
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The  Relationship  Between  Reading  and  Listening 

A  problem  related  to -the  one  just  discussed  is  the  clarification  of 
the  relationship  between  reading  and  listening  at  both  normal  and  accelerated 
word  rates.   Such  clarification  would  permit  more  informed  decisions  regarding 
the  circumstances  under  which  accelerated  listening  would  serve  as  supplementary 
to  or  as  a  substitute  for  normal  reading.   Also,  it  would  provide  a  basis  for 
gauging  the  extent  to  which  those  procedures  developed  for  the  improvement  of 
reading  rate  could  be  generalized  to  the  improvement  of  listening  rate. 

Problems  of  Measurement 

It  was  generally  recognized  that  more  thinking  ought  to  be  done  about 
what  is  usually  measured,  and  what  ought  to  be  measured  in  tests  of  listening 
comprehension.  Researchers  have,  for  the  most  part,  prefered  multiple  choice 
tests  because  of  their  statistical  reliability  and  ease  of  administration  and 
scoring.  However,  such  tests  are  valid  only  to  the  extent  that  they  measure  the 
factors  involved  in  listening  comprehension.  It  may  be  desirable  to  consider 
other  kinds  of  tests  as  well,  tests  requiring  recall  and  reconstruction. 

Another  urgent  measurement  problem,  recognized  by  many,  has  to  do  with 
the  specification  of  oral  reading  rates.   Common  practice  has  been  to  specify 
in  terms  of  the  number  of  words  spoken  per  minute.   However,  this  approach  results 
in  considerable  variability  in  the  productions  of  different  readers  and  in  different 
productions  of  the  same  reader.   One  reason  is  that  longer  words  take  more  time  to 
pronounce,  and  are  therefore  produced  at  a  slower  rate.   Hence,  those  listening 
selections  with  longer  average  word  length  will  appear  to  be  read  at  a  slower 
word  rate.   Some  evidence  (See  Chapter  XIII)  suggests  that  syllable  rate  may 
provide  a  less  variable  and  more  meaningful  specification.   Further  research 
on  this  problem  is  clearly  indicated. 

Problems  of  Experimental  Design 

Conference  participants  found  much  to  criticize  about  the  conception 
and  design  of  experiments  dealing  with  compressed  or  expanded  speech.   A 
frequent  recommendation  was  that  more  careful  attention  should  be  given  to  the 
populations  that  are  sampled  when  subjects  are  recruited  for  experiments.   It 
was  pointed  out  that  researchers  have  too  often  sampled  from  college  populations 
for  reasons  of  convenience  with  the  hope  that  their  results  would  generalize  to 
groups  such  as  blind  school  children,  typical  adults,  etc.   Another  general 
criticism  was  that,  for  reasons  of  economy  of  time  and  effort,  experimenters  have 
tended  to  base  their  conclusions  upon  results  obtained  from  relatively  naive 
subjects  who  have  been  given  relatively  brief  exposures  to  time  compressed  or 
expanded  speech.   It  was  felt  that  we  must  soon  come  to  grips  with  the  problem 
of  designing  and  executing  studies  which  provide  significant  exposure  to  time 
compressed  or  expanded  speech  over  a  long  period  of  time.   It  was  regarded  as 
especially  important  that  some  of  these  longitudinal  studies  involve  young 
children  who  may  be  able  to  master  very  fast  word  rates,  just  as  young  children 
apparently  can  master  foreign  languages  with  relative  ease. 

Organismic  Variables 

A  host  of  organismic  variables,  the  contributions  of  which  are   not 
sufficiently  understood,  were  mentioned,  and  some  were  mentioned  often  enough 
and  by  enough  people  to  reflect  a  general  concern  or  interest.   Included  were 
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relatively  unmod i f iab 1 e  states,  pertaining  to  basic  constitution,  such  as 

mental  capacity  (with  special  reference  to  mental  retardation)  and  degree 

of  perceptual  handicap,  and  relatively  modifiable  states  such  as  motivation,  interest 

fatigue,  initial  resistance  to  very  rapid  speech,  and  attentive  adjustment.   The 

two  variables  mentioned  last  appeared  to  be  of  special  interest.   Many  participants 

reported  that  they  had  sensed  an  initial  resistance  to  very  rapid  speech  on  the 

part  of  some  listeners.   They  felt  that  an  inability  to  overcome  this  resistance 

might  limit  seriously  the  practicability  of  the  technique,  and  they  recommended 

the  development  of  procedures  to  overcome  this  resistance.   It  was  felt  that, 

because  of  the  reduced  redundancy  of  accelerated  speech,  the  listener's  attentive 

adjustment  becomes  a  more  critical  problem.   The  normal  distractions,  with 

which  the  listener  to  normal  speech  has  learned  to  contend,  are    likely  to  interfere 

seriously  with  the  comprehension  of  accelerated  speech.   It  was  recommended  that 

the  relationship  between  attentive  adjustment  and  comprehension,  as  word  rate  is 

increased,  be  given  serious  experimental  attention. 

St  imu 1  us  Var  iab 1 es 

One  frequently  discussed  class  of  stimulus  variables  pertained  to  the 
characteristics  of  the  accelerated  speech  display.   It  was  pointed  out  that 
aural  communication  may  depend  upon  somewhat  different  perceptual  and  cognitive 
operations  than  visual  communication  with  the  result  that  different  vocabulary, 
sentence  structure,  format,  etc.,  may  be  required  for  maximum  efficiency  of 
commun  i  cat  i  on . 

It  might  be  desirable  to  consider  surrendering  some  of  the  time  gained  by 
the  acceleration  of  word  rate,  by  introducing  pauses  at  strategic  points  in  an 
accelerated  listening  selection.   Such  pauses  might  provide  needed  time  for  implicit 
rehearsal,  stimulus  encoding,  or  some  such  operation  upon  which  the  demonstration 
of  comprehension  may  depend. 

It  might  be  desirable  to  preceed  an  accelerated  listening  selection  with  a 
list  of  the  unfamilar  words  in  that  selection.   Presumably,  this  selective  preview 
would  increase  the  d i scr im i nab i 1 i ty  of  such  words  and  thus  increase  the  chances 
for  their  accurate  reception  during  presentation  of  the  selection. 

Since  familar  selections  can  be  understood  more  easily  than  unfamilar 
selections  at  high  compressions,  it  may  be  feasible  to  use  highly  compressed  versions 
of  previously  encountered  selections  for  purposes  of  review.   V/ord  rates  appropriate 
in  this  connection  might  be  considerably  faster  than  the  word  rates  suitable  for 
original  presentation. 

One  of  the  major  disadvantages  of  reading  by  listening,  compressed  or 
otherwise,  in  comparison  to  visual  reading,  is  the  relatively  greater  difficulty 
with  which  emphasis  and  retrieval  can  be  accomplished.   The  visual  reader  can  vary 
his  reading  rate  in  accordance  with  the  demands  of  the  material  being  read,  and  can 
retrace  with  ease.   He  can  skim  through  a  bock  rapidly  and  find  desired  information 
quickly.   The  person  who  reads  by  listening,  on  the  other  hand,  finds  it  difficult, 
with  existing  equipment,  to  retrace  or  to  vary  his  listening  rate.   Finding  a 
particular  item  of  information  is  often  quite  expensive  in  time.   It  was  felt  that 
with  properly  recorded  material  and  properly  designed  playback  equipment,  the 
problems  of  the  aural  reader  could  be  substantially  reduced.   For  instance,  if 
time  compressed  tape,  with  indexing  signals  recorded  on  it  at  appropriate  points 
in  a  listening  selection,  could  be  played  on  a  tape  recorder  that  was  variable 
with  respect  to  the  speed  and  direction  of  tape  motion,  selective  attention  and 
retrieval  would  be  greatly  facilitated.   If,  in  addition,  this  tape  recorder  were 
capable  of  moderate  and  variable  compression,  the  disadvantages  of  aural  reading 
could  be  further  reduced. 
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Finally,  it  was  mentioned  repeatedly  that  the  compression  to  be  desired 
would  depend,  in  part,  upon  the  kind  of  material  to  be  heard.   It  was  recommended 
that,  although  a  beginning  has  been  made,  a  good  deal  of  research  is  still  required 
to  clarify  the  interaction  of  kind  of  listening  material  and  word  rate  on  listening 
comprehens  i  on . 

Other  Stimulus  Variables 

Other  stimulus  variables,  such  as  the  reader's  voice  quality,  his  reading 
style,  and  the  initial  reading  rate,  or  the  rate  before  compression,  received 
frequent  mention.   There  was  also  some  discussion  of  the  contribution  of  individual 
speech  sounds  to  the  intelligibility  of  compressed  words.   It  was  felt  that  if 
speech  sounds  were  affected  differentially  by  compression,  and  if  they  contributed 
differentially  to  word  d i scr im i nab i 1 i ty ,  the  interaction  of  these  factors  would 
have  to  be  understood  in  order  to  predict  the  consequences  of  compression. 

Technological  Research 

A  strong  need  was  felt  for  further  development  of  instruments  that  compress 
or  expand  speech  by  either  electromechanical  or  electronic  sampling.   One  extremely 
valuable  objective  would  be  a  speech  compressor  with  good  signal  quality  that 
could  be  sold  cheaply  enough  to  permit  individual  ownership.   It  was  emphasized 
repeatedly  that  the  current  expense  associated  with  speech  compression  equipment 
imposes  a  serious  limitation  upon  the  development  of  the  area.   Another  insistent 
recommendation  was  for  the  research  needed  to  guide  the  development  of  playback 
equipment,  suitable  for  the  reproduction  of  time  compressed  speech.   It  was  pointed 
out  that  many  signal  distortions  which  are   not  critical  in  the  reproduction  of 
speech  at  normal  rates,  may  become  critical  at  accelerated  rates.   Knowledge  of  the 
contributions  of  various  kinds  of  distortion  should  be  used  in  stating  the  design 
criteria  for  playback  equipment.   The  choice  of  earphones  or  loudspeaker  constitutes 
a  simple  illustration.   It  has  been  found  that  highly  compressed  words  are    significantly 
more  intelligible  when  heard  by  means  of  earphones,  instead  of  a  loudspeaker.   This 
is  undoubtedly  due  to  the  damping  problems,  inherent  in  loudspeakers,  that  are 
avoided  in  earphones.   Other  factors  to  be  considered  in  the  design  of  a  reproducer 
might  be  con t i nuous 1 y  var i ab 1 e  control,  in  both  directions,  of  tape  speed,  and  the 
ability  to  record  indexing  signals  on  tape  that  would  be  reproduced  audibly  at  the 
high  tape  speeds  used  during  scanning  operations.   Similar  capability  would,  of 
course,  be  desirable  for  record  reproducers.   In  this  connection,  the  relative 
advantages  of  tapes  on  open  reels,  tape  cartridges,  and  records  should  be  studied. 

A  study  should  be  undertaken  to  determine  the  feasibility  of  telephone  lines 
as  a  means  of  distributing  time  compressed  listening  selections.   For  instance,  a 
system  is  conceivable  in  which  a  listener,  by  dialing  the  appropriate  number,  could 
be  connected  with  a  central  facility  capable  of  supplying  him  with  any  listening 
selection  in  storage  at  any  desired  word  rate. 

Several  methods  for  the  time  compression  or  expansion  of  speech  are   available 
or  under  development.   Some  examples  are  compression  by  electromechanical  sampling, 
compression  by  means  of  a  computer,  harmonic  compression,  and  compression  by  accelerated 
playback  of  recorded  material.   The  outputs  of  these  methods  should  be  compared 
carefully  with  respect  to  response  characteristics,  word  intelligibility,  and  com- 
prehension of  connected  discourse.   Such  comparisons  are    important,  because  the 
methods  differ  considerably  with  respect  to  such  factors  as  cost  and  simplicity. 
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It  was  recommended  that  some  consideration  be  given  to  the  possibility  of 
combining  methods  of  speech  compression.   The  method  of  playing  a  tape  or  record 
at  a  faster  speed  than  the  one  used  during  recording,  though  it  introduces  pitch 
distortion,  has  the  advantage  of  being  inexpensive  and  simple.   Its  distortion  can 
be  tolerated  when  acceleration   is  moderate,  and  the  method  might  be  used  for  further 
tailoring  of  word  rates  of  selections  in  accordance  with  individual  preferences 
that  have  already  been  moderately  compressed  by  the  more  satisfactory  sampling  method 

Developing  Uses  For  Time 
Compressed  Or  Expanded  Speech 

The  application  of  speech  compression  techniques  to  the  reading  problems 
of  blind  people  has  received  considerable  attention  already.   However,  it  was  the 
general  feeling  of  conference  participants  that  many  other  uses  should  also  be 
explored.   It  was  recommended  that  studies  be  conducted  to  determine  potential 
target  populations  for  compressed  or  expanded  speech,  and  that  projects  be  organized 
to  demonstrate  the  usefulness  of  time  compressed  or  expanded  speech  in  new  applicati 
It  was  suggested  that  there  might  be  a  considerable  potential  for  time  compressed 
speech  as  a  general  educational  tool.   Compressed  speech  might  also  serve  a  diagnostic 
function  in  the  investigation  of  personality  or  perceptual  handicap.   It  has  already 
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shown  some  promise  as  a  technique  for  diagnosing  the  underlying  reason  for  hearing 
loss.   Expanded  speech  might  be  useful  to  those  learning  shorthand  or  typing. 
Passages,  presented  initially  at  a  very  slow  word  rate,  could  be  copied  by  shorthand 
or  typing  students  and  the  word  rate  could  be  gradually  increased,  as  their  skill 
permitted.   Expanded  speech  techniques  might  also  be  used  to  slow  the  word  rate  for 
students  of  a  foreign  language  or  for  patients  in  speech  therapy.   Mentally  retarded 
children  might,  under  some  circumstances,  receive  benefit  from  either  time  expanded 
or  time  compressed  speech.   Many  other  applications  may  be  imagined.   It  was  the 
general  conviction  of  conference  participants  that  these  applications  should  be 
identified  and,  where  feasible,  developed. 

Standardization  of  Terminology 
And  Equipment 

The  lack  of  a  standard  and  generally  understood  vocabulary  of  terms  was 
considered  by  conference  participants  to  be  a  serious  problem.   For  example,  to 
some  people,  "rapid  speech"  is  the  term  reserved  for  speech  that  has  been  accelerated 
by  reproducing  a  tape  or  record  at  a  faster  speed  than  the  speed  used  during  recording 
To  others,  it  has  a  more  general  significance.   Similarly,  "compressed  speech",  to 
many  people,  means  speech  that  has  been  compressed  by  the  sampling  method,  while  to 
others,  it  is  speech  reproduced  in  less  than  the  original  time,  regardless  of  method. 
Some  people  describe  compressed  speech  in  terms  of  the  percent  of  compression,  others 
in  terms  of  the  percent  of  acceleration,  and  still  others  in  terms  of  the  compressed 
word  rate.   It  was  recommended  that  steps  be  taken  to  arrive  at  a  general  agreement 
regarding  such  matters,  and  that,  in  the  meantime,  some  thought  be  given  to  the 
publication  of  a  glossary  of  the  various  terms  in  common  use  today,  together  with 
their  several  meanings,  where  necessary. 

The  need  for  standardization  of  equipment  was  also  urged.   It  was  pointed 
out  that  the  interfacing  problems  arising  from  lack  of  compatibility  of  recording 
and  reproducing  equipment,  with  respect  to  such  factors  as  tape  speed,  track 
configuration,  response  curve  equalization,  etc.,  could  be  quite  serious.   It  was 
therefore  recommended  that  an  effort  be  made  to  develop  equipment  specifications 
as  gu  idel i  nes . 
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One  need  repeatedly  stressed  was  for  the  development  of  both  hardware 
and  software  for  interfacing  with  human  beings  so  that  meaningful  measurements 
and  specifications  can  be  made.  .Otherwise,  little  more  than  talk  is  possible. 

Dissemination  of  Information 

A  serious  need  for  better  publicity  was  recognized.   It  was  generally 
believed  that  many  potential  users  of  time  compressed  or  expanded  speech  are  failing 
to  explore  its  possibilities,  simply  because  they  are  unaware  of  its  existance. 
Other  people,  though  aware,  find  it  difficult  to  keep  themselves  informed  because 
of  the  absence  of  a  convenient  source  of  inquiry.   A  variety  of  recommendations 
to  alleviate  this  situation  were  made.   They  included  the  compiling  of  a  mailing 
list  and  the  distribution  of  newsletters,  research  reports,  annotated  bibliographies, 
and  demonstration  tapes  or  records.   The  establishment  of  a  speakers'  bureau  was 
recommended.   It  was  suggested  that  advantage  be  taken  of  existing  dissemination 
facilities,  such  as  the  Educational  Research  Information  Center  (ERIC).   The 
presentation  of  instructional  seminars  for  researchers,  and  workshops  for  educators 
and  other  users  of  time  compressed  or  expanded  speech,  was  strongly  recommended. 

Problems  of  Distribution 

As  matters  presently  stand,  the  equipment  required  for  the  satisfactory 
time  compression  or  expansion  of  speech  is  far  too  expensive  for  individual  owner- 
ship.  The  only  feasible  alternative  appears  to  be  the  establishment  of  a  center 
(or  centers)  where  economic  production  can  be  achieved  through  volumn.   This 
arrangement  of  course,  implies  some  system  of  distribution,  and  it  was  strongly 
urged  that  considerable  thought  be  given  to  the  orderly  development  of  a  distribution 
system . 

The  Implementation  Committee 

In  an  effort  to  rescue  this  conference  from  a  fate  that  frequently  befalls 
conferences  of  this  sort,  an  implementation  committee  (See  Appendix  B) ,  was  appointed 
and  charged  with  the  responsibility  of  promoting  positive  action  on  the  recommendations 
arising  from  the  conference.   The  committee  held  its  first  meeting  immediately  upon 
conclusion  of  the  conference.   During  this  meeting,  plans  were  made  for  the 
establishment  of  the  Center  For  Rate  Controlled  Recordings,  at  the  University  of 
Louisville,  and  the  committee  transformed  itself  into  the  Advisory  Board  for  the 
Center.   The  Center  has  now  been  established  in  space  made  available  by  the  University 
of  Louisville  and  is  prepared  to  offer  service.   Its  capacity,  due  to  a  temporarily 
inadequate  financial  base,  is  limited.   However,  application  will  shortly  be  made 
to  an  appropriate  funding  agency  for  financial  support  of  the  Center's  operations. 
As  conceived  by  the  Advisory  Board,  the  Center's  functions  will  include  the 
provision,  at  a  modest  cost,  of  rate  controlled  recorded  speech,  the  preparation 
and  distribution  of  newsletters,  technical  reports,  and  other  similar  information, 
and  the  conduct  of  research  on  the  technological  and  psychoeducat i ona 1  aspects  of 
rate  controlled  speech.   The  Advisory  Board  has  held  several  meetings  since  the 
adjournment  of  the  conference,  and  it  plans  to  meet  two  or  three  times  yearly. 
When  the  Center  For  Rate  Controlled  Recordings  has  been  brought  to  full  scale 
operation,  many  of  the  recommendations  of  the  conference  will  have  been  realized. 
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